What is it?

In deep learning, “cross-entropy” is a function used to measure the difference between two probability distributions, typically used in classification tasks.

Analogy: Guessing the Color of Candies

Imagine you have a bag of candies containing red, green, and blue colors. You guess the color distribution of the candies is 50% red, 30% green, and 20% blue. However, the actual distribution is 60% red, 20% green, and 20% blue.

Cross-entropy tells you how close your guess (prediction) is to the actual distribution of the candies. If your guess is far from the truth, the cross-entropy value will be large; if you guess closely, the cross-entropy value will be small.

Mathematical Explanation

The mathematical formula for cross-entropy can be written as:

\[ H(p, q) = - \sum p(x) \log(q(x)) \]

Where:

  • \( p(x) \) is the true probability distribution (e.g., the actual color distribution of the candies).
  • \( q(x) \) is your predicted probability distribution (e.g., your guessed color distribution).
  • This formula means: for each possible candy color, the difference between the true probability \( p(x) \) and your predicted probability \( q(x) \) is multiplied by \( \log(q(x)) \), then negated, and finally summed over all possible colors.

Intuitive Explanation:

  1. If you guess correctly: For example, if you guess that the red candies are 60%, and the truth is also 60%, then the cross-entropy value will be small, indicating your prediction is accurate.
  2. If you guess incorrectly: For instance, if you guess that the red candies are only 20%, while the truth is 60%, then the cross-entropy value will be large, indicating a significant difference between your prediction and the actual situation.

Calculation of Cross-Entropy

Example:

Suppose we have a simple classification problem with only two categories, such as “cat” and “dog.”
The true label is “cat,” so the true probability \( p(x) \) is:

  • Cat: 1.0
  • Dog: 0.0

The model’s predicted probabilities \( q(x) \) are:

  • Cat: 0.8
  • Dog: 0.2

Step-by-Step Calculation:

According to the formula, the cross-entropy calculation will involve each category. We substitute the probabilities of these two categories:

\[ H(p, q) = - \left( p(\text{cat}) \cdot \log(q(\text{cat})) + p(\text{dog}) \cdot \log(q(\text{dog})) \right) \]
  1. Calculate the “cat” part:

    • True probability \( p(\text{cat}) = 1.0 \)
    • Predicted probability \( q(\text{cat}) = 0.8 \)
    • Calculate this part: \( 1.0 \cdot \log(0.8) = \log(0.8) \approx -0.2231 \)
  2. Calculate the “dog” part:

    • True probability \( p(\text{dog}) = 0.0 \)
    • Predicted probability \( q(\text{dog}) = 0.2 \)
    • Since \( p(\text{dog}) = 0 \), this term contributes 0: \( 0.0 \cdot \log(0.2) = 0 \)
  3. Total Sum: Adding the two parts and taking the negative:

\[ H(p, q) = - \left( -0.2231 + 0 \right) = 0.2231 \]

Result Explanation:

The cross-entropy value is 0.2231, indicating that the model’s prediction is relatively close to the true label, but not completely correct (if the prediction were 1.0, the cross-entropy would be 0).

Application in Deep Learning

In classification tasks in deep learning, the model outputs a probability distribution for each category. For example, the probability of a picture containing a cat is 0.7, a dog is 0.2, and a bird is 0.1. Cross-entropy is used to measure the difference between the model’s predicted distribution and the true labels, helping the model adjust its parameters to improve prediction accuracy.

By minimizing cross-entropy, we can make the model’s predicted distribution as close as possible to the true category distribution, which is why cross-entropy is often used as a loss function.