Cost Function

In machine learning, a cost function is a mathematical expression that quantifies the difference between the model’s predictions and the actual values. It acts as a measure of how well or poorly the model is performing.

‍

Key idea

Low cost → model predictions are close to reality.
High cost → model is making significant errors.

‍

Common examples

Regression: Mean Squared Error (MSE).
Classification: Cross-Entropy Loss.
Sparse data: Hinge loss (used in SVMs).

‍

Applications
Training an AI model essentially means minimizing the cost function. Optimization algorithms iteratively adjust model parameters to reduce the cost and improve generalization.

‍

The cost function is often described as the compass of machine learning. It tells the model whether it is moving in the right direction during training. By repeatedly adjusting weights to minimize the cost, models gradually improve their ability to generalize to new data.

‍

Different tasks require different cost functions. For regression, mean squared error (MSE) penalizes large mistakes heavily, making it suitable for continuous predictions. For classification, cross-entropy measures the divergence between predicted probabilities and true labels, pushing the model to be more confident when correct and less confident when wrong.

‍

It is important to note that the choice of cost function is not neutral: it encodes what kind of errors matter most. In medicine, false negatives may be much more costly than false positives, so specialized loss functions are designed to reflect these priorities. In deep learning, researchers even design custom loss functions tailored to specific problems, such as perceptual loss for image generation or triplet loss for embeddings.

‍

Reference

Bishop, C. M. (2006). Pattern Recognition and Machine Learning. Springer.