Confusion Matrix Calculator - Classification Metrics
Analyze classification performance with accuracy, precision, recall, F1, and MCC
Input your confusion matrix values to calculate accuracy, precision, recall, specificity, F1-score, and other performance metrics for binary classification analysis.
Confusion Matrix Calculator - Classification Metrics
Analyze classification performance with accuracy, precision, recall, F1, and MCC
About Confusion Matrix Calculator
A confusion matrix is a 2×2 table that summarises the performance of a binary classification model by tabulating the counts of four outcomes: True Positives (TP), False Positives (FP), True Negatives (TN), and False Negatives (FN). From these four numbers, a rich set of performance metrics can be derived, each measuring a different aspect of classifier behaviour.
True Positives (TP) are cases where the model correctly predicted the positive class. False Positives (FP) are cases where the model predicted positive but the true label is negative — also called Type I errors. True Negatives (TN) are correct negative predictions. False Negatives (FN) are cases where the model missed a positive — also called Type II errors.
Key metrics derived from the confusion matrix include:
• Accuracy = (TP + TN) / Total — the fraction of all predictions that are correct. Simple to interpret but misleading on imbalanced datasets.
• Precision (Positive Predictive Value) = TP / (TP + FP) — of all positive predictions, how many are truly positive. High precision means few false alarms.
• Recall (Sensitivity, True Positive Rate) = TP / (TP + FN) — of all actual positives, how many were detected. High recall means few missed positives.
• Specificity (True Negative Rate) = TN / (TN + FP) — of all actual negatives, how many were correctly identified. Important in medical screening.
• F1-Score = 2 × (Precision × Recall) / (Precision + Recall) — the harmonic mean of precision and recall, balancing both metrics. Useful when classes are imbalanced.
• Matthews Correlation Coefficient (MCC) = (TP×TN − FP×FN) / √((TP+FP)(TP+FN)(TN+FP)(TN+FN)) — a balanced metric that accounts for all four cells of the confusion matrix. Ranges from −1 (perfect disagreement) to +1 (perfect agreement), with 0 indicating random performance.
Choosing the right metric depends on your problem. For fraud detection, recall is critical (never miss a fraud). For spam filters, precision is more important (do not block legitimate emails). For rare-disease screening, both recall and specificity matter. The MCC and F1-score are generally more informative than accuracy alone when classes are imbalanced.
Examples
The table shows confusion matrix inputs and their resulting performance metrics.
| TP, FP, TN, FN | Key Metrics | Model Type |
|---|---|---|
| TP=92, FP=8, TN=88, FN=12 | Accuracy=90%, F1=0.9020, MCC=0.801 | Well-balanced, high-performance model |
| TP=45, FP=5, TN=95, FN=25 | Accuracy=82.35%, Precision=90%, Recall=64.29% | High precision, conservative predictions |
| TP=85, FP=30, TN=70, FN=10 | Accuracy=79.5%, Recall=89.5%, Specificity=70% | High recall, sensitive model |
| TP=48, FP=12, TN=188, FN=2 | Accuracy=94%, Sensitivity=96%, Specificity=94% | Medical diagnostic test — high sensitivity |
How to Use the Confusion Matrix Calculator
- Enter the number of True Positives (TP): cases where the model correctly predicted the positive class.
- Enter False Positives (FP): the model predicted positive but the true label was negative (Type I error).
- Enter True Negatives (TN): correct negative predictions, and False Negatives (FN): missed positives (Type II error).
- Click 'Calculate Metrics' to instantly compute accuracy, precision, recall, specificity, F1-score, MCC, NPV, FPR, and FNR.
- Use the quick-load example buttons to explore pre-configured scenarios such as balanced models, high-precision models, and medical tests.
Frequently Asked Questions
What is the difference between precision and recall?
Precision answers 'of all the positive predictions, how many were correct?' while recall answers 'of all the actual positives, how many did the model find?' High precision means few false positives (the model is careful about labelling something positive). High recall means few false negatives (the model rarely misses a true positive). There is typically a trade-off between them, controlled by the decision threshold.
Why is accuracy not always the best metric?
Accuracy can be misleading on imbalanced datasets. For example, if only 1% of transactions are fraudulent, a model that always predicts 'not fraud' achieves 99% accuracy but detects zero fraud cases. In such scenarios, precision, recall, F1-score, or MCC provide a much more informative picture of model performance.
What is the Matthews Correlation Coefficient (MCC)?
MCC is a single balanced metric that considers all four confusion matrix cells (TP, FP, TN, FN). It ranges from −1 to +1, where +1 is perfect prediction, 0 is no better than random, and −1 is total disagreement. MCC is considered one of the most informative metrics for binary classification, especially on imbalanced datasets, because it is not inflated by large class imbalances.
What does specificity measure and when is it important?
Specificity (True Negative Rate) = TN / (TN + FP) measures how well the model avoids false positives among actual negatives. It is especially important in medical screening: a high-specificity test minimises the number of healthy people incorrectly flagged as sick, reducing unnecessary follow-up tests and patient anxiety. Sensitivity (recall) and specificity together define the ROC curve.
How is F1-score related to precision and recall?
F1-score is the harmonic mean of precision and recall: F1 = 2 × (P × R) / (P + R). Using the harmonic mean ensures that F1 is low if either precision or recall is low — you cannot achieve a high F1 by sacrificing one for the other. F1 ranges from 0 (worst) to 1 (best) and is the most common single metric when you need to balance both false positives and false negatives.
What is the difference between sensitivity and specificity in medical tests?
Sensitivity (recall) is the probability that the test correctly identifies a diseased patient: TP / (TP + FN). A highly sensitive test misses very few sick patients, making it good for ruling out disease. Specificity is the probability that the test correctly identifies a healthy patient: TN / (TN + FP). A highly specific test produces few false positives, making it good for confirming disease. Most diagnostic tests involve a trade-off between the two, represented by the ROC curve.