Cross Entropy
Cross-entropy is a loss function widely used in classification tasks. It quantifies the difference between the true distribution of labels and the predicted probability distribution produced by a model.
Key idea
- Perfect prediction (high probability on the correct class) → very low loss.
- Incorrect or uncertain prediction → high loss.
Example
In binary classification (e.g., spam vs. not spam):
- True label = spam, predicted probability = 0.95 → low cross-entropy.
- True label = spam, predicted probability = 0.05 → high cross-entropy.
Applications
- Logistic regression.
- Neural networks (CNNs, RNNs, Transformers).
- Speech recognition and natural language generation.
Cross-entropy is not only a mathematical tool but also an intuitive measure of surprise: it tells us how “shocked” the model would be given the true outcome. If a model assigns high probability to the correct class, there is little surprise, and thus the loss is low. If it is confident in the wrong answer, the loss becomes very high—penalizing overconfident mistakes more severely than hesitant ones.
A practical advantage of cross-entropy is its connection to maximum likelihood estimation. Minimizing this loss is equivalent to maximizing the likelihood of the observed data under the model, making it a principled choice for probabilistic classification.
In multi-class tasks, cross-entropy works seamlessly with the softmax function, which turns raw logits into probabilities. This makes it the standard loss function for deep learning models across computer vision, NLP, and speech. However, one limitation is its sensitivity to class imbalance: when some classes are rare, the loss can be dominated by majority classes unless reweighted.
Reference
- Bishop, C. M. (2006). Pattern Recognition and Machine Learning. Springer.
- Cross Entropy Loss: Optimize your AI models, Innovatiana