Glossary
Underfitting
Underfitting
Underfitting occurs when a machine learning model is too simple to capture the underlying structure of the data. Unlike overfitting, where the model memorizes training data without generalizing well, underfitting reflects a failure to learn even the basic patterns. As a result, the model performs poorly both on the training set and on unseen test data.
Main causes
- Overly simplistic model: for example, applying linear regression to a dataset with strong nonlinear relationships.
- Insufficient training: too few epochs or iterations prevent the model from converging.
- Excessive regularization: penalties such as L1, L2, or dropout applied too strongly can oversimplify the model.
- Poor feature representation: missing or irrelevant features limit the ability to detect meaningful patterns.
Examples
- A sales prediction model that only uses one variable (e.g., advertising spend) and ignores seasonality, competitor activity, or pricing effects.
- A speech recognition system trained with too few parameters, struggling to distinguish basic words.
- An image classifier that cannot differentiate between cats and dogs because its filters are too coarse.
Consequences
- Low accuracy across datasets: poor performance on both training and test data.
- Weak predictions: results are too general, failing to detect important signals.
- Business impact: in areas like fraud detection or predictive maintenance, underfitting can mean missing critical anomalies.
How to address underfitting
- Increase model complexity (e.g., moving from linear to polynomial regression, decision trees, or deep neural networks).
- Train longer with more iterations or adjust the learning rate.
- Reduce regularization strength if it constrains the model too much.
- Add more relevant features through feature engineering.
- Gather higher-quality or larger datasets to provide richer learning signals.
Relation to overfitting
Underfitting and overfitting are two extremes of the bias-variance tradeoff. Effective machine learning aims to strike a balance: models should be complex enough to capture patterns but not so complex that they memorize noise.
📚 Further Reading
- Géron, A. (2019). Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow. O’Reilly.
- Goodfellow, I., Bengio, Y., Courville, A. (2016). Deep Learning. MIT Press.