ROC Curve & AUC Calculator - Binary Classifier Evaluation

Advanced Statistical Tests

Input your model's prediction scores and true labels below to generate an ROC curve and calculate the Area Under the Curve (AUC).

ROC Curve & AUC Calculator - Binary Classifier Evaluation
Advanced Statistical Tests

Enter one observation per line as 'score,label'. Labels must be 0 or 1. Example: 0.9,1

About the ROC Curve & AUC Calculator

A Receiver Operating Characteristic (ROC) curve is a graphical tool for evaluating the discrimination ability of a binary classification model across all possible decision thresholds. It plots the True Positive Rate (TPR, or sensitivity) on the y-axis against the False Positive Rate (FPR, or 1 − specificity) on the x-axis as the decision threshold varies from high to low. Sensitivity (TPR) is the proportion of actual positives correctly identified: TPR = TP / (TP + FN). Specificity is the proportion of actual negatives correctly identified: Specificity = TN / (TN + FP). The FPR = 1 − Specificity = FP / (TN + FP). A perfect classifier would pass through the top-left corner (FPR = 0, TPR = 1), while a random classifier's ROC curve lies along the diagonal from (0,0) to (1,1). The Area Under the ROC Curve (AUC) summarizes the overall classification performance as a single scalar. An AUC of 1.0 represents perfect discrimination; 0.5 represents no discrimination (equivalent to random guessing). Conventionally: AUC ≥ 0.9 is excellent, 0.8–0.9 is good, 0.7–0.8 is fair, and below 0.7 is poor. This calculator computes the AUC using the trapezoidal rule, which integrates the area under the step-function ROC curve. It also identifies the optimal decision threshold using Youden's J statistic (J = sensitivity + specificity − 1), which maximizes the sum of sensitivity and specificity and provides a balanced operating point. ROC curves and AUC are standard evaluation metrics in medical diagnostics (where classifiers separate diseased from healthy patients), machine learning (evaluating binary classification models), and credit scoring. Unlike accuracy, AUC is insensitive to class imbalance, making it particularly valuable when positive cases are rare. This tool accepts any list of score-label pairs. Scores can be probabilities, logit values, or any continuous ranking. Labels must be 0 (negative class) or 1 (positive class). The results table shows all ROC operating points, with the optimal threshold row highlighted for easy reference.

ROC Curve Examples

These examples show how AUC values correspond to different levels of classifier performance.

Score, Label pairsAUCInterpretation
0.9,1 / 0.8,1 / 0.3,0 / 0.2,0AUC = 1.0Perfect classifier
0.9,1 / 0.8,1 / 0.75,1 / 0.6,0 / 0.55,1 / 0.45,0 / 0.4,0 / 0.35,0AUC ≈ 0.9375Excellent discrimination
0.9,0 / 0.8,1 / 0.7,0 / 0.6,1 / 0.5,0 / 0.4,1AUC ≈ 0.33Inverse ranking — worse than random

How to Use This Calculator

  1. Enter one observation per line in the format 'score,label' where score is a numeric prediction and label is 0 or 1.
  2. Ensure both positive (label=1) and negative (label=0) examples are present in your data.
  3. Click 'Calculate' to compute the AUC and generate the ROC curve points.
  4. Review the AUC value and its qualitative interpretation (excellent, good, fair, or poor).
  5. Find the optimal threshold row (highlighted in the table) for the best balanced sensitivity/specificity trade-off.

Frequently Asked Questions

What is AUC and why is it important?
AUC (Area Under the ROC Curve) measures a classifier's ability to rank positive instances higher than negative instances across all thresholds. It is threshold-independent and robust to class imbalance, making it a standard benchmark for binary classification models in medicine, machine learning, and finance.
What does an AUC of 0.5 mean?
An AUC of 0.5 means the classifier performs no better than random guessing — it ranks positive and negative instances randomly. Any AUC below 0.5 suggests the classifier is systematically wrong, and inverting its predictions would yield above-chance performance.
How is the optimal threshold selected?
This calculator uses Youden's J statistic (J = sensitivity + specificity − 1) to select the optimal threshold. It maximizes the sum of sensitivity and specificity, providing a balanced operating point. Alternative criteria such as minimizing cost or maximizing F1-score may yield different optimal thresholds depending on the application.
Can AUC be used for multi-class classification?
The standard AUC is defined for binary classification. For multi-class problems, the one-vs-rest AUC can be computed for each class separately, or the macro-average or weighted-average AUC can be reported. This calculator supports only binary classification (labels 0 and 1).
What is the difference between sensitivity and specificity?
Sensitivity (recall or TPR) measures how well the classifier detects true positives: TP / (TP + FN). Specificity measures how well it avoids false alarms: TN / (TN + FP). High sensitivity is crucial when missing a positive case is costly (e.g., disease screening). High specificity is important when false positives are costly (e.g., confirmatory tests).
Is AUC always the best metric for model evaluation?
AUC is excellent for comparing models across thresholds and for imbalanced datasets, but it is not always the best choice. For highly imbalanced data, the Precision-Recall AUC (PR-AUC) is often more informative. For a specific decision threshold, metrics such as F1-score, accuracy, or Matthews correlation coefficient may be more relevant.