Overfitting vs Underfitting
An overfit model memorizes the training data and fails on new data; an underfit model is too simple to capture the pattern even on the training data. The diagnostic is the gap between training and validation error: large gap = overfitting, both errors high = underfitting. The fixes are opposite directions of the same dial.
Definitions in One Line Each
- Underfitting: the model is too simple β it misses real structure. High bias, low variance.
- Good fit: the model captures the signal but ignores the noise.
- Overfitting: the model is too flexible β it memorizes noise as if it were signal. Low bias, high variance.
The Polynomial Example Everyone Uses
Imagine you fit a polynomial to 30 noisy points sampled from a sine curve.
Degree 1 (linear): train MSE = 0.45, test MSE = 0.48 β UNDERFIT
Degree 3 (cubic): train MSE = 0.04, test MSE = 0.05 β GOOD FIT
Degree 15: train MSE = 0.001, test MSE = 0.62 β OVERFIT
The degree-15 model wiggles through every training point β including the noise β and is wildly wrong between them. The cubic captures the curve and ignores the wiggles. The line is too rigid to bend at all.
How to Tell Which One You Have
train_score test_score diagnosis
----------- ---------- -----------------------------------------
low low Underfitting (model too simple)
high high Good fit
high low Overfitting (model memorized training set)
low high Bug β leakage, wrong split, or tiny test set
"Low" means the metric is bad: high error or low accuracy. The fourth row is a red flag, not a model problem β investigate before tuning anything.
Read the Learning Curve
Plot training and validation error as the training set grows:
- Both curves high and close together β underfitting. More data won't help; you need a more flexible model.
- Big gap that doesn't close β overfitting. More data will help, and so will regularization.
- Both curves converge low β good fit; you're done.
How to Fix Underfitting
- Use a more flexible model (linear β tree, shallow tree β deep tree, linear β polynomial features, small NN β bigger NN).
- Add features β interactions, transformations, domain-specific signals.
- Reduce regularization (lower L1/L2 penalty, higher tree depth, fewer dropout layers).
- Train longer if loss is still going down.
How to Fix Overfitting
- Get more training data β the single most effective fix when it's available.
- Regularize: add L1/L2 penalty, dropout, max-depth on trees, early stopping.
- Simplify the model: fewer features, fewer parameters, shallower trees.
- Cross-validate hyperparameters β k-fold CV on a holdout, not on the test set.
- Use ensembles: bagging (random forest) reduces variance; boosting + early stopping is the standard production answer.
The Bias-Variance Tradeoff
Total prediction error decomposes into three parts:
Total error = BiasΒ² + Variance + Irreducible noise
- High bias β underfitting (assumptions are wrong)
- High variance β overfitting (sensitive to training sample)
- Irreducible noise β can't be fixed by any model
Tuning the model is mostly a tradeoff between bias and variance. Regularization moves the dial toward bias; flexibility moves it toward variance. The minimum total error sits between the extremes.
Code Sketch in scikit-learn
from sklearn.model_selection import learning_curve
from sklearn.ensemble import GradientBoostingRegressor
import numpy as np
sizes, train_scores, val_scores = learning_curve(
GradientBoostingRegressor(max_depth=5, n_estimators=200),
X, y, cv=5, scoring="neg_mean_squared_error",
train_sizes=np.linspace(0.1, 1.0, 8),
)
# Plot mean of -train_scores and -val_scores against sizes.
# Big persistent gap = overfitting; both flat-and-high = underfitting.
Common Pitfalls
- Tuning on the test set: if you pick the model with the best test score across many trials, you've overfit to the test set. Use a separate validation split or cross-validation.
- Data leakage: features computed from the future, target encoding done before splitting, or scaling fit on the whole dataset all produce "great" training scores that collapse in production.
- Tiny test sets: a 50-row test set has high variance β a single lucky split looks like a great model.
- Calling deep neural nets "always overfitting": with enough data, regularization, and early stopping, large models often underfit until trained for longer.
Pro Tip: Always look at both training and validation metrics, not just validation. A model with 95% validation accuracy and 95% training accuracy is healthy. A model with 95% validation accuracy and 99.9% training accuracy is overfitting and will degrade as the data drifts.
β Back to Data Analysis Tips