What is it?
In deep learning, “cross-entropy” is a function used to measure the difference between two probability distributions, typically used in classification tasks.
Analogy: Guessing the Color of Candies
Imagine you have a bag of candies containing red, green, and blue colors. You guess the color distribution of the candies is 50% red, 30% green, and 20% blue. However, the actual distribution is 60% red, 20% green, and 20% blue.
Cross-entropy tells you how close your guess (prediction) is to the actual distribution of the candies. If your guess is far from the truth, the cross-entropy value will be large; if you guess closely, the cross-entropy value will be small.
Mathematical Explanation
The mathematical formula for cross-entropy can be written as:
Where:
is the true probability distribution (e.g., the actual color distribution of the candies). is your predicted probability distribution (e.g., your guessed color distribution).- This formula means: for each possible candy color, the difference between the true probability
and your predicted probability is multiplied by , then negated, and finally summed over all possible colors.
Intuitive Explanation:
- If you guess correctly: For example, if you guess that the red candies are 60%, and the truth is also 60%, then the cross-entropy value will be small, indicating your prediction is accurate.
- If you guess incorrectly: For instance, if you guess that the red candies are only 20%, while the truth is 60%, then the cross-entropy value will be large, indicating a significant difference between your prediction and the actual situation.
Calculation of Cross-Entropy
Example:
Suppose we have a simple classification problem with only two categories, such as “cat” and “dog.”
The true label is “cat,” so the true probability
- Cat: 1.0
- Dog: 0.0
The model’s predicted probabilities
- Cat: 0.8
- Dog: 0.2
Step-by-Step Calculation:
According to the formula, the cross-entropy calculation will involve each category. We substitute the probabilities of these two categories:
-
Calculate the “cat” part:
- True probability
- Predicted probability
- Calculate this part:
- True probability
-
Calculate the “dog” part:
- True probability
- Predicted probability
- Since
, this term contributes 0:
- True probability
-
Total Sum: Adding the two parts and taking the negative:
Result Explanation:
The cross-entropy value is 0.2231, indicating that the model’s prediction is relatively close to the true label, but not completely correct (if the prediction were 1.0, the cross-entropy would be 0).
Application in Deep Learning
In classification tasks in deep learning, the model outputs a probability distribution for each category. For example, the probability of a picture containing a cat is 0.7, a dog is 0.2, and a bird is 0.1. Cross-entropy is used to measure the difference between the model’s predicted distribution and the true labels, helping the model adjust its parameters to improve prediction accuracy.
By minimizing cross-entropy, we can make the model’s predicted distribution as close as possible to the true category distribution, which is why cross-entropy is often used as a loss function.