Decision Boundary
A decision boundary in machine learning is a line (in 2D) or a surface (in higher dimensions) that separates different classes in a classification problem. It indicates the threshold at which a model assigns one label instead of another.
Example
- In logistic regression, the boundary is typically linear.
- In non-linear classifiers such as kernel SVMs or neural networks, the decision boundary can be highly curved and irregular.
Applications
- Image classification: dog vs. cat recognition.
- Spam detection: boundary between spam and non-spam emails.
- Medical diagnostics: predicting disease vs. healthy state.
A decision boundary can be thought of as the invisible frontier that a model learns to draw in order to separate different categories. While in two dimensions this frontier is easy to visualize, in real-world tasks it usually exists in hundreds or even thousands of dimensions, making it abstract but still critical.
The shape of the boundary reflects the model’s flexibility. Simple linear models create straight, flat boundaries, which are easy to interpret but may miss complex patterns. More expressive models—like deep neural networks—can carve intricate, highly non-linear boundaries that capture subtle distinctions but also risk overfitting if not properly regularized.
Decision boundaries are also a way to understand model uncertainty. Near the boundary, the model is less confident, and small changes in input can flip the predicted class. This insight is used in active learning, where uncertain points are selected for labeling to improve the dataset and sharpen the model’s boundary.
Key insight
A well-calibrated decision boundary ensures the model balances accuracy and generalization, avoiding both underfitting and overfitting.
Reference
- Bishop, C. M. (2006). Pattern Recognition and Machine Learning. Springer.