Question 1

What is the difference between precision and recall?

Accepted Answer

Precision answers 'of all the positive predictions, how many were correct?' while recall answers 'of all the actual positives, how many did the model find?' High precision means few false positives (the model is careful about labelling something positive). High recall means few false negatives (the model rarely misses a true positive). There is typically a trade-off between them, controlled by the decision threshold.

Question 2

Why is accuracy not always the best metric?

Accepted Answer

Accuracy can be misleading on imbalanced datasets. For example, if only 1% of transactions are fraudulent, a model that always predicts 'not fraud' achieves 99% accuracy but detects zero fraud cases. In such scenarios, precision, recall, F1-score, or MCC provide a much more informative picture of model performance.

Question 3

What is the Matthews Correlation Coefficient (MCC)?

Accepted Answer

MCC is a single balanced metric that considers all four confusion matrix cells (TP, FP, TN, FN). It ranges from −1 to +1, where +1 is perfect prediction, 0 is no better than random, and −1 is total disagreement. MCC is considered one of the most informative metrics for binary classification, especially on imbalanced datasets, because it is not inflated by large class imbalances.

Question 4

What does specificity measure and when is it important?

Accepted Answer

Specificity (True Negative Rate) = TN / (TN + FP) measures how well the model avoids false positives among actual negatives. It is especially important in medical screening: a high-specificity test minimises the number of healthy people incorrectly flagged as sick, reducing unnecessary follow-up tests and patient anxiety. Sensitivity (recall) and specificity together define the ROC curve.

Question 5

How is F1-score related to precision and recall?

Accepted Answer

F1-score is the harmonic mean of precision and recall: F1 = 2 × (P × R) / (P + R). Using the harmonic mean ensures that F1 is low if either precision or recall is low — you cannot achieve a high F1 by sacrificing one for the other. F1 ranges from 0 (worst) to 1 (best) and is the most common single metric when you need to balance both false positives and false negatives.

Question 6

What is the difference between sensitivity and specificity in medical tests?

Accepted Answer

Sensitivity (recall) is the probability that the test correctly identifies a diseased patient: TP / (TP + FN). A highly sensitive test misses very few sick patients, making it good for ruling out disease. Specificity is the probability that the test correctly identifies a healthy patient: TN / (TN + FP). A highly specific test produces few false positives, making it good for confirming disease. Most diagnostic tests involve a trade-off between the two, represented by the ROC curve.

TP, FP, TN, FN	Key Metrics	Model Type
TP=92, FP=8, TN=88, FN=12	Accuracy=90%, F1=0.9020, MCC=0.801	Well-balanced, high-performance model
TP=45, FP=5, TN=95, FN=25	Accuracy=82.35%, Precision=90%, Recall=64.29%	High precision, conservative predictions
TP=85, FP=30, TN=70, FN=10	Accuracy=79.5%, Recall=89.5%, Specificity=70%	High recall, sensitive model
TP=48, FP=12, TN=188, FN=2	Accuracy=94%, Sensitivity=96%, Specificity=94%	Medical diagnostic test — high sensitivity

Confusion Matrix Calculator - Classification Metrics

About Confusion Matrix Calculator

Examples

How to Use the Confusion Matrix Calculator

Frequently Asked Questions