About

Classification errors propagate silently. A model reporting 95% accuracy on an imbalanced dataset where 95% of samples belong to one class is no better than a constant predictor. Relying on accuracy alone masks critical failure modes: missed fraud transactions, undetected tumors, ignored intrusions. This tool computes the full diagnostic profile from a 2×2 confusion matrix - accuracy, precision, recall, F₁ score, Matthews Correlation Coefficient (MCC), Cohen’s κ, balanced accuracy, and Youden’s J. Each metric exposes a different dimension of classifier performance.

The tool assumes a binary classification context with known ground-truth labels. Results are exact given integer counts. For multi-class problems, apply one-vs-rest decomposition externally and enter each binary sub-problem here. Note: MCC is undefined when any row or column of the confusion matrix sums to zero. The calculator flags such edge cases explicitly rather than returning misleading values.

Formulas

All metrics derive from a 2×2 confusion matrix with counts TP, TN, FP, FN. Total sample size N = TP + TN + FP + FN.

Accuracy = TP + TNN

F₁ = 2 ⋅ Precision ⋅ RecallPrecision + Recall = 2 ⋅ TP2 ⋅ TP + FP + FN

MCC = TP ⋅ TN − FP ⋅ FN√(TP+FP)(TP+FN)(TN+FP)(TN+FN)

MCC is undefined when any factor in the denominator is zero. The calculator returns NaN in that case.

κ = p_o − p_e1 − p_e

Where p_o = observed agreement (accuracy), and p_e = expected agreement by chance = (TP+FP)(TP+FN) + (TN+FN)(TN+FP)N².

Variable legend: TP = True Positives (correctly predicted positive). TN = True Negatives (correctly predicted negative). FP = False Positives (Type I error). FN = False Negatives (Type II error). N = total sample count. PPV = Positive Predictive Value (Precision). TPR = True Positive Rate (Recall / Sensitivity). TNR = True Negative Rate (Specificity). FPR = False Positive Rate = 1 − TNR. FNR = False Negative Rate = 1 − TPR.

Reference Data

Metric	Formula	Range	Best Value	Use When
Accuracy	(TP + TN) ÷ N	0 - 1	1	Balanced classes
Precision (PPV)	TP ÷ (TP + FP)	0 - 1	1	Cost of false alarms is high
Recall (Sensitivity / TPR)	TP ÷ (TP + FN)	0 - 1	1	Missing positives is costly
Specificity (TNR)	TN ÷ (TN + FP)	0 - 1	1	Negative class matters
F₁ Score	2 ⋅ P ⋅ RP + R	0 - 1	1	Imbalanced datasets
MCC	See Formulas section	−1 - 1	1	Gold standard for binary
Cohen’s Kappa (κ)	See Formulas section	−1 - 1	1	Agreement beyond chance
Balanced Accuracy	(TPR + TNR) ÷ 2	0 - 1	1	Imbalanced classes
Youden’s J	TPR + TNR − 1	−1 - 1	1	Optimal threshold selection
Prevalence	(TP + FN) ÷ N	0 - 1	-	Dataset composition check
Negative Predictive Value	TN ÷ (TN + FN)	0 - 1	1	Trust in negative predictions
False Discovery Rate	FP ÷ (FP + TP)	0 - 1	0	Complement of Precision
False Omission Rate	FN ÷ (FN + TN)	0 - 1	0	Missed positive rate among negatives
Positive Likelihood Ratio	TPR ÷ FPR	0 - ∞	∞	Diagnostic utility
Negative Likelihood Ratio	FNR ÷ TNR	0 - ∞	0	Rule-out power
Diagnostic Odds Ratio	LR⁺ ÷ LR⁻	0 - ∞	∞	Single discriminative power metric

Frequently Asked Questions

When one class dominates (e.g., 99% negative), a trivial classifier that always predicts "negative" achieves 99% accuracy while detecting zero positive cases. In such scenarios, precision, recall, F1, and especially MCC provide a more truthful assessment. MCC ranges from −1 to +1 and returns 0 for any constant predictor regardless of class distribution.

F1 score is the harmonic mean of precision and recall and focuses exclusively on the positive class. It ignores true negatives entirely. MCC (Matthews Correlation Coefficient) uses all four quadrants of the confusion matrix and is considered the single best metric for binary classification quality by Chicco & Jurman (2020). Use F1 when only the positive class matters (e.g., information retrieval). Use MCC for a balanced, overall quality measure.

A negative MCC indicates the classifier performs worse than random chance - its predictions are inversely correlated with the truth. A negative Cohen's Kappa similarly means agreement is below what random guessing would achieve. Both suggest the model's decision boundary is inverted or fundamentally flawed. Values near −1 mean systematic misclassification.

Positive Predictive Value (Precision) depends directly on disease or event prevalence via Bayes' theorem. Even a test with 99% sensitivity and 99% specificity yields only ~50% PPV when prevalence is 1%. This is the base-rate fallacy. Always check prevalence before trusting precision in medical or rare-event screening contexts.

Youden's J statistic equals Sensitivity + Specificity − 1. Geometrically, it represents the maximum vertical distance from the ROC curve to the diagonal chance line. A J of 0 means the classifier operates at chance level. A J of 1 means perfect separation. It is commonly used to select the optimal probability threshold that maximizes discriminative ability.

No. If any of the four sums (TP+FP), (TP+FN), (TN+FP), or (TN+FN) equals zero, the MCC denominator becomes zero. This occurs when the classifier never predicts one class, or one class is entirely absent from the test set. The metric is mathematically undefined in these degenerate cases. This calculator reports it as undefined rather than returning a misleading zero.