Variance

What is variance in Machine Learning?

Variance measures how much a model’s predictions fluctuate if we train it on different subsets of the same dataset. High variance means the model is too sensitive, capturing noise instead of the underlying patterns.

‍

Variance vs Bias: the classic trade-off

High variance, low bias → The model memorizes the training set but fails to generalize (overfitting). Example: a decision tree grown too deep.
Low variance, high bias → The model is too simplistic, missing important patterns (underfitting). Example: using a linear model for a highly non-linear problem.

‍

The goal is not to eliminate variance entirely but to control it so that the model remains flexible yet stable.

‍

Real-world implications

Healthcare: a high-variance diagnostic model may perform well in one hospital dataset but fail in another.
Finance: a trading algorithm might show excellent backtesting results but collapse in live markets.

‍

Mitigation techniques

Cross-validation to detect variance early.
Regularization and pruning to simplify complex models.
Ensemble methods (bagging, random forests) that reduce variance by combining multiple learners.

‍

In machine learning, variance reflects the sensitivity of a model to the specific training data it has seen. A high-variance model essentially “chases” the training set too closely, fitting even small idiosyncrasies that may not appear in future data. This leads to overfitting—strong training performance but poor generalization.

‍

Variance is often studied together with bias in what is known as the bias–variance trade-off. While high variance causes overfitting, low variance combined with high bias can lead to underfitting. Effective models strike a balance: flexible enough to capture patterns, but regularized enough to ignore noise.

‍

From a practical standpoint, variance can be reduced by techniques such as cross-validation, ensemble methods (e.g., bagging, random forests), and regularization. Monitoring variance is especially crucial in domains like healthcare or autonomous driving, where overfitted models may fail under real-world variability, leading to dangerous consequences.

‍

📖 References

Bias–variance tradeoff (Wikipedia)
Domingos, P. (2012). A few useful things to know about machine learning. CACM.