Accuracy

In Machine Learning, accuracy is one of the most straightforward performance metrics. It tells us the proportion of predictions a model gets right compared to the total number of predictions it makes.

‍

Why it matters
If a model achieves 90% accuracy, it means that out of every 10 predictions, 9 were correct. This simplicity makes accuracy an appealing first measure of model performance.

‍

Limitations
Accuracy can, however, give a false sense of reliability in cases where the data is imbalanced. For example, if 95% of medical test results are negative and only 5% positive, a model that always predicts “negative” will still achieve 95% accuracy — but completely fail at detecting actual positives. That’s why complementary metrics like precision, recall, and F1-score are essential for a complete evaluation.

‍

Applications

Email spam filters.
Fraud detection in finance.
Medical diagnostics and image recognition.

‍

Accuracy is often described as the most intuitive metric: it simply asks, “How often did the model get it right?” For quick sanity checks, it is invaluable, especially during the early stages of model development when one needs a baseline reference. However, accuracy alone rarely tells the whole story. In practice, practitioners often distinguish between overall accuracy and balanced accuracy, the latter correcting for imbalanced datasets by averaging performance across classes.

‍

In real-world systems, accuracy must be interpreted in context. A 99% accurate handwriting recognition model may be exceptional, but a 99% accurate self-driving car vision system would still be catastrophically unreliable if the 1% includes pedestrians or stop signs. This is why industries under strict safety or regulatory standards often prioritize other metrics—such as recall in medical tests or precision in fraud detection—over plain accuracy.

‍

Another subtlety is that accuracy assumes all errors carry the same cost, which is rarely true. Missing a fraudulent transaction is far more damaging than wrongly flagging a legitimate one. Because of this, practitioners often use confusion matrices to break down true positives, true negatives, false positives, and false negatives, offering a more nuanced understanding than accuracy alone.

‍

📚 Further Reading

Sokolova, M. & Lapalme, G. (2009). A systematic analysis of performance measures for classification.
Géron, A. (2019). Hands-On Machine Learning with Scikit-Learn, Keras & TensorFlow.