User Rating 0.0 β˜…β˜…β˜…β˜…β˜…
Total Usage 0 times
Confusion Matrix
Predicted Positive
Predicted Negative
Actual Positive
Actual Negative
Total (N) 0
Positives 0
Negatives 0
Prevalence 0%
Enter confusion matrix values to see metrics.
Is this tool helpful?

Your feedback helps us improve.

β˜… β˜… β˜… β˜… β˜…

About

Classification errors propagate silently. A model reporting 95% accuracy on an imbalanced dataset where 95% of samples belong to one class is no better than a constant predictor. Relying on accuracy alone masks critical failure modes: missed fraud transactions, undetected tumors, ignored intrusions. This tool computes the full diagnostic profile from a 2Γ—2 confusion matrix - accuracy, precision, recall, F1 score, Matthews Correlation Coefficient (MCC), Cohen’s ΞΊ, balanced accuracy, and Youden’s J. Each metric exposes a different dimension of classifier performance.

The tool assumes a binary classification context with known ground-truth labels. Results are exact given integer counts. For multi-class problems, apply one-vs-rest decomposition externally and enter each binary sub-problem here. Note: MCC is undefined when any row or column of the confusion matrix sums to zero. The calculator flags such edge cases explicitly rather than returning misleading values.

accuracy calculator confusion matrix precision recall F1 score classification metrics MCC Cohen's Kappa machine learning statistics

Formulas

All metrics derive from a 2Γ—2 confusion matrix with counts TP, TN, FP, FN. Total sample size N = TP + TN + FP + FN.

Accuracy = TP + TNN
F1 = 2 β‹… Precision β‹… RecallPrecision + Recall = 2 β‹… TP2 β‹… TP + FP + FN
MCC = TP β‹… TN βˆ’ FP β‹… FN√(TP+FP)(TP+FN)(TN+FP)(TN+FN)

MCC is undefined when any factor in the denominator is zero. The calculator returns NaN in that case.

ΞΊ = po βˆ’ pe1 βˆ’ pe

Where po = observed agreement (accuracy), and pe = expected agreement by chance = (TP+FP)(TP+FN) + (TN+FN)(TN+FP)N2.

Variable legend: TP = True Positives (correctly predicted positive). TN = True Negatives (correctly predicted negative). FP = False Positives (Type I error). FN = False Negatives (Type II error). N = total sample count. PPV = Positive Predictive Value (Precision). TPR = True Positive Rate (Recall / Sensitivity). TNR = True Negative Rate (Specificity). FPR = False Positive Rate = 1 βˆ’ TNR. FNR = False Negative Rate = 1 βˆ’ TPR.

Reference Data

MetricFormulaRangeBest ValueUse When
Accuracy(TP + TN) Γ· N0 - 11Balanced classes
Precision (PPV)TP Γ· (TP + FP)0 - 11Cost of false alarms is high
Recall (Sensitivity / TPR)TP Γ· (TP + FN)0 - 11Missing positives is costly
Specificity (TNR)TN Γ· (TN + FP)0 - 11Negative class matters
F1 Score2 β‹… P β‹… RP + R0 - 11Imbalanced datasets
MCCSee Formulas sectionβˆ’1 - 11Gold standard for binary
Cohen’s Kappa (ΞΊ)See Formulas sectionβˆ’1 - 11Agreement beyond chance
Balanced Accuracy(TPR + TNR) Γ· 20 - 11Imbalanced classes
Youden’s JTPR + TNR βˆ’ 1βˆ’1 - 11Optimal threshold selection
Prevalence(TP + FN) Γ· N0 - 1 - Dataset composition check
Negative Predictive ValueTN Γ· (TN + FN)0 - 11Trust in negative predictions
False Discovery RateFP Γ· (FP + TP)0 - 10Complement of Precision
False Omission RateFN Γ· (FN + TN)0 - 10Missed positive rate among negatives
Positive Likelihood RatioTPR ÷ FPR0 - ∞∞Diagnostic utility
Negative Likelihood RatioFNR ÷ TNR0 - ∞0Rule-out power
Diagnostic Odds RatioLR+ Γ· LRβˆ’0 - ∞∞Single discriminative power metric

Frequently Asked Questions

When one class dominates (e.g., 99% negative), a trivial classifier that always predicts "negative" achieves 99% accuracy while detecting zero positive cases. In such scenarios, precision, recall, F1, and especially MCC provide a more truthful assessment. MCC ranges from βˆ’1 to +1 and returns 0 for any constant predictor regardless of class distribution.
F1 score is the harmonic mean of precision and recall and focuses exclusively on the positive class. It ignores true negatives entirely. MCC (Matthews Correlation Coefficient) uses all four quadrants of the confusion matrix and is considered the single best metric for binary classification quality by Chicco & Jurman (2020). Use F1 when only the positive class matters (e.g., information retrieval). Use MCC for a balanced, overall quality measure.
A negative MCC indicates the classifier performs worse than random chance - its predictions are inversely correlated with the truth. A negative Cohen's Kappa similarly means agreement is below what random guessing would achieve. Both suggest the model's decision boundary is inverted or fundamentally flawed. Values near βˆ’1 mean systematic misclassification.
Positive Predictive Value (Precision) depends directly on disease or event prevalence via Bayes' theorem. Even a test with 99% sensitivity and 99% specificity yields only ~50% PPV when prevalence is 1%. This is the base-rate fallacy. Always check prevalence before trusting precision in medical or rare-event screening contexts.
Youden's J statistic equals Sensitivity + Specificity βˆ’ 1. Geometrically, it represents the maximum vertical distance from the ROC curve to the diagonal chance line. A J of 0 means the classifier operates at chance level. A J of 1 means perfect separation. It is commonly used to select the optimal probability threshold that maximizes discriminative ability.
No. If any of the four sums (TP+FP), (TP+FN), (TN+FP), or (TN+FN) equals zero, the MCC denominator becomes zero. This occurs when the classifier never predicts one class, or one class is entirely absent from the test set. The metric is mathematically undefined in these degenerate cases. This calculator reports it as undefined rather than returning a misleading zero.