Underfitting

Underfitting occurs when a machine learning model is too simple to capture the underlying structure of the data. Unlike overfitting, where the model memorizes training data without generalizing well, underfitting reflects a failure to learn even the basic patterns. As a result, the model performs poorly both on the training set and on unseen test data.

‍

Main causes

Overly simplistic model: for example, applying linear regression to a dataset with strong nonlinear relationships.
Insufficient training: too few epochs or iterations prevent the model from converging.
Excessive regularization: penalties such as L1, L2, or dropout applied too strongly can oversimplify the model.
Poor feature representation: missing or irrelevant features limit the ability to detect meaningful patterns.

‍

Examples

A sales prediction model that only uses one variable (e.g., advertising spend) and ignores seasonality, competitor activity, or pricing effects.
A speech recognition system trained with too few parameters, struggling to distinguish basic words.
An image classifier that cannot differentiate between cats and dogs because its filters are too coarse.

‍

Consequences

Low accuracy across datasets: poor performance on both training and test data.
Weak predictions: results are too general, failing to detect important signals.
Business impact: in areas like fraud detection or predictive maintenance, underfitting can mean missing critical anomalies.

‍

How to address underfitting

Increase model complexity (e.g., moving from linear to polynomial regression, decision trees, or deep neural networks).
Train longer with more iterations or adjust the learning rate.
Reduce regularization strength if it constrains the model too much.
Add more relevant features through feature engineering.
Gather higher-quality or larger datasets to provide richer learning signals.

‍

Relation to overfitting
Underfitting and overfitting are two extremes of the bias-variance tradeoff. Effective machine learning aims to strike a balance: models should be complex enough to capture patterns but not so complex that they memorize noise.

‍

📚 Further Reading

Géron, A. (2019). Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow. O’Reilly.
Goodfellow, I., Bengio, Y., Courville, A. (2016). Deep Learning. MIT Press.