Accuracy is the proportion of correct predictions made by a classification model or diagnostic test out of all predictions. It equals (TP + TN) / (TP + TN + FP + FN), where TP = true positives, TN = true negatives, FP = false positives, and FN = false negatives.

What is the formula for accuracy?

The accuracy formula is: Accuracy = (TP + TN) / (TP + TN + FP + FN). In percentage form: Accuracy (%) = [(TP + TN) / Total] × 100.

What is a good accuracy percentage?

A good accuracy depends on the application. Scores of 95-100% are excellent, 90-94% are very good, 80-89% are good, and 70-79% are fair. Below 70% is typically poor. However, on imbalanced datasets, even 99% accuracy can be misleading — always inspect precision, recall, and F1 score alongside accuracy.

What is the difference between accuracy and precision?

Accuracy measures overall correctness: (TP + TN) / Total. Precision measures the quality of positive predictions only: TP / (TP + FP). A model can have high accuracy but low precision if most of the data is negative. Precision matters most when false positives are costly, such as in spam detection.

Can accuracy be misleading?

Yes. On imbalanced datasets, accuracy can be deceptive — a model that always predicts the majority class achieves high accuracy while being useless. For example, predicting 'no disease' for everyone in a population where 99% are healthy yields 99% accuracy but catches zero sick patients. This is called the accuracy paradox. Always pair accuracy with recall, precision, and F1 score on imbalanced data.

What is balanced accuracy?

Balanced accuracy accounts for class imbalance by averaging sensitivity (recall) and specificity: Balanced Accuracy = (Sensitivity + Specificity) / 2. It gives equal weight to performance on both positive and negative classes, making it more informative than standard accuracy when classes are unequal in size.

Accuracy Calculator: Machine Learning & Diagnostic Confusion Matrix

Accuracy Calculator

Accuracy Formula Accuracy = (TP + TN) / (TP + TN + FP + FN)

True Positives (TP) Correctly identified as positive

True Negatives (TN) Correctly identified as negative

False Positives (FP) Predicted positive, actually negative (Type I error)

False Negatives (FN) Predicted negative, actually positive (Type II error)

Enter values in the Confusion Matrix tab first, then return here for the full worked solution.

No data yet — enter TP, TN, FP, and FN in the Confusion Matrix tab first.

Key Formulas

Accuracy (TP + TN) / (TP + TN + FP + FN)

Precision TP / (TP + FP)

Recall (Sensitivity) TP / (TP + FN)

Specificity TN / (TN + FP)

F1 Score 2 × (P × R) / (P + R)

Balanced Accuracy (Sensitivity + Specificity) / 2

Accuracy Score Guide

95–100%: Excellent — strong model or test performance.
90–94%: Very Good — suitable for most production uses.
80–89%: Good — acceptable with further tuning.
70–79%: Fair — inspect class balance and recall.
Below 70%: Poor — revisit features and model choice.

Hypothesis Testing Guide

p-values, test types, and decision rules explained

Related Tools & Guides

What is Accuracy?

Accuracy is a classification metric that measures the proportion of correct predictions out of all predictions made. It answers one direct question: out of every case evaluated, how many did the model or test classify correctly? Formally, accuracy is defined as the number of true positives plus true negatives divided by the total number of observations.

In both machine learning and diagnostic testing, accuracy is computed from a confusion matrix, a two-by-two table that records how a classifier's predictions compare to the actual ground truth labels. The four cells of that table — TP, TN, FP, and FN — are the raw material from which accuracy and every related metric is derived.

Featured snippet answer: Accuracy is a statistical classification metric measuring the overall proportion of correct predictions. It is calculated by dividing the sum of True Positives and True Negatives by the total number of observations: (TP + TN) ÷ (TP + TN + FP + FN).

Accuracy Formula Library

Six formulas govern classification performance evaluation. Accuracy alone does not tell the full story; each formula below targets a different aspect of predictive quality. Understanding when each applies is as important as knowing how to compute it.

Basic Accuracy Formula

Accuracy = (TP + TN) / (TP + TN + FP + FN)

In percentage:
Accuracy (%) = Accuracy × 100

Error Rate Formula

Error Rate = 1 − Accuracy

Equivalently:
Error Rate = (FP + FN) / Total

Precision Formula

Precision = TP / (TP + FP)

Answers: Of all positive
predictions, how many
were actually correct?

Recall (Sensitivity) Formula

Recall = TP / (TP + FN)

Answers: Of all actual
positives, how many did
the model catch?

Specificity Formula

Specificity = TN / (TN + FP)

Answers: Of all actual
negatives, how many were
correctly ruled out?

F1 Score & Balanced Accuracy

F1 = 2 × (P × R) / (P + R)

Balanced Accuracy
= (Sensitivity + Specificity)
  / 2

These formulas are standard across machine learning, clinical epidemiology, and quality assurance. The scikit-learn documentation on model evaluation covers their Python implementations, while the BMJ's Statistics at Square One addresses their use in clinical research.

How to Calculate Accuracy from a Confusion Matrix — Step by Step

To calculate accuracy: build your confusion matrix, sum the four cells to get the total, then divide the number of correct predictions (TP + TN) by that total. Here is the complete method with a worked numerical example.

Identify the four confusion matrix values

From your model's predictions versus actual labels, record TP (predicted positive, actually positive), TN (predicted negative, actually negative), FP (predicted positive, actually negative), and FN (predicted negative, actually positive). Example: a disease screening test on 200 patients yields TP = 85, TN = 90, FP = 10, FN = 15.

Sum the total number of observations

Total = TP + TN + FP + FN = 85 + 90 + 10 + 15 = 200. This is the denominator for every metric in the confusion matrix.

Apply the accuracy formula

Accuracy = (TP + TN) / Total = (85 + 90) / 200 = 175 / 200 = 0.875.

Convert to a percentage

0.875 × 100 = 87.5%. The test correctly classifies 87.5% of all patients.

Compute additional metrics and interpret in context

Precision = 85 / (85 + 10) = 89.5%. Recall = 85 / (85 + 15) = 85.0%. F1 Score = 2 × (0.895 × 0.850) / (0.895 + 0.850) = 87.2%. Because the classes are relatively balanced here (100 true positives vs. 100 true negatives), accuracy is a fair summary. On imbalanced datasets, F1 score and balanced accuracy carry more weight.

Result: TP = 85, TN = 90, FP = 10, FN = 15, Total = 200. Accuracy = 87.5%, Error Rate = 12.5%, Precision = 89.5%, Recall = 85.0%, Specificity = 90.0%, F1 = 87.2%. Verify all six values using the calculator above.

Worked Examples Across Three Domains

Example 1 — Disease Screening Test

Scenario: A rapid diagnostic test is evaluated on 10,000 patients. Physicians need to know whether the test's 87% accuracy is sufficient for routine screening, and how often the test misses true cases.

Confusion matrix values

TP = 850 (sick patients correctly identified), TN = 8,850 (healthy patients correctly cleared), FP = 150 (healthy patients falsely flagged), FN = 150 (sick patients missed). Total = 10,000.

Accuracy calculation

Accuracy = (850 + 8,850) / 10,000 = 9,700 / 10,000 = 97.0%.

Recall (critical for screening)

Recall = 850 / (850 + 150) = 850 / 1,000 = 85.0%. This means 15% of truly sick patients are missed by the test, which is clinically significant.

Specificity

Specificity = 8,850 / (8,850 + 150) = 8,850 / 9,000 = 98.3%. Very few healthy patients are falsely flagged.

Interpretation: 97% accuracy sounds impressive, but 15% of sick patients go undetected (recall = 85%). For a serious disease, this miss rate could be clinically unacceptable. This is why medical test evaluation in the WHO's diagnostic test guidance requires sensitivity and specificity alongside accuracy.

Example 2 — Spam Detection System (NLP)

Scenario: A natural language processing (NLP) binary classifier is trained to detect spam emails. Evaluation on a 5,000-email test set with balanced classes produces the following confusion matrix.

Confusion matrix

TP = 2,300 (spam correctly flagged), TN = 2,400 (legitimate emails correctly passed), FP = 100 (legitimate emails wrongly flagged as spam), FN = 200 (spam emails missed). Total = 5,000.

Accuracy

(2,300 + 2,400) / 5,000 = 4,700 / 5,000 = 94.0%.

Precision

2,300 / (2,300 + 100) = 2,300 / 2,400 = 95.8%. Very few legitimate emails are lost to the spam folder.

F1 Score

Recall = 2,300 / (2,300 + 200) = 92.0%. F1 = 2 × (0.958 × 0.920) / (0.958 + 0.920) = 93.9%. Balanced model.

Interpretation: The model achieves 94% accuracy with precision prioritized over recall — the correct trade-off for spam filtering, where sending legitimate email to spam (false positive) costs more than letting occasional spam through (false negative).

Example 3 — Credit Risk Prediction (Financial Modeling)

Scenario: A logistic regression model predicts whether loan applicants will default. The dataset is imbalanced: 90% of applicants repay (negative class) and 10% default (positive class). Evaluating on 2,000 applicants.

Confusion matrix

TP = 120 (defaulters correctly flagged), TN = 1,750 (repayers correctly approved), FP = 50 (repayers wrongly rejected), FN = 80 (defaulters missed and approved). Total = 2,000.

Accuracy

(120 + 1,750) / 2,000 = 1,870 / 2,000 = 93.5%.

Why accuracy misleads here

A model that approves everyone would achieve (0 + 1,800) / 2,000 = 90% accuracy without catching a single defaulter. The difference in accuracy between our model (93.5%) and this naive strategy (90%) understates the real improvement in predictive value.

Balanced Accuracy

Recall = 120 / 200 = 60.0%. Specificity = 1,750 / 1,800 = 97.2%. Balanced Accuracy = (0.600 + 0.972) / 2 = 78.6%. This is the more honest summary on imbalanced data.

Interpretation: Despite 93.5% accuracy, the model catches only 60% of defaulters. Balanced accuracy (78.6%) and recall (60%) are the metrics that lenders and regulators actually need to evaluate. This illustrates the accuracy paradox directly.

Accuracy Score Interpretation: Quick Reference

What counts as a "good" accuracy score depends entirely on the application, class balance, and the cost of errors. The table below provides general benchmarks, but always pair them with recall and precision before drawing conclusions.

Table: Accuracy Score Ranges and Interpretation

Accuracy Score	Interpretation	Common Context	Watch Out For
95–100%	Excellent	Image classification, OCR, medical imaging AI	Overfitting; check on unseen test data
90–94%	Very Good	NLP classifiers, diagnostic tests	Still inspect recall on minority class
80–89%	Good	Fraud detection, churn prediction	May need tuning; check F1 score
70–79%	Fair	Early-stage models, noisy data	Review feature engineering
Below 70%	Poor	Near-random performance	Check for data leakage or class imbalance

95%+

Excellent

Production ready

80–94%

Good to Very Good

Check recall too

<70%

Poor

Revisit model

Accuracy vs. Precision vs. Recall vs. F1 Score

Accuracy measures overall correctness. Precision targets the quality of positive predictions. Recall targets the completeness of positive detection. F1 Score balances precision and recall into one number for imbalanced datasets. Choosing the right metric depends on what type of error is more costly in your application.

Table: Classification Metrics Compared

Metric	Formula	Best Used When	Limitation
Accuracy	(TP+TN) / Total	Balanced classes, general overview	Misleading on imbalanced data (accuracy paradox)
Precision	TP / (TP+FP)	Cost of FP is high (spam, fraud alerts)	Ignores false negatives entirely
Recall	TP / (TP+FN)	Cost of FN is high (cancer screening, safety)	Can be maximized by predicting everything positive
F1 Score	2PR / (P+R)	Imbalanced classes, when both FP and FN matter	Does not include TN in the calculation
Balanced Accuracy	(Sensitivity+Specificity)/2	Highly imbalanced binary classification	Less interpretable than F1 in some contexts
Specificity	TN / (TN+FP)	Ruling out conditions (clinical screening)	Says nothing about positive prediction quality

⚠ The Accuracy Paradox: On a dataset where 99% of cases are negative, a classifier that always predicts "negative" achieves 99% accuracy — but catches zero true positives. This is the accuracy paradox. Whenever your positive class makes up less than 20% of the dataset, treat accuracy as a secondary metric and prioritize recall, precision, and F1 score. The Google Machine Learning crash course covers this distinction in depth.

Confusion Matrix and Accuracy: Complete Formula Reference

The table below lists every key term and formula related to accuracy and confusion matrix evaluation. It is structured for direct reference by students, researchers, and practitioners.

Table: Accuracy Metric Glossary — 10 Key Entities

Term	Symbol / Formula	Plain-English Definition	Primary Use Case
Accuracy	(TP+TN) / Total	Proportion of all predictions that are correct	General model evaluation on balanced data
Error Rate	(FP+FN) / Total	Proportion of all predictions that are wrong; equals 1 minus accuracy	Communicating failure rate to non-technical audiences
Precision	TP / (TP+FP)	Of all positive predictions, the share that were actually positive	When false positives are costly: spam detection, fraud alerts
Recall (Sensitivity)	TP / (TP+FN)	Of all actual positives, the share the model correctly identified	When false negatives are costly: disease detection, safety systems
Specificity	TN / (TN+FP)	Of all actual negatives, the share correctly ruled out	Clinical screening; ruling out conditions in diagnostic testing
F1 Score	2×(P×R)/(P+R)	Harmonic mean of precision and recall; penalizes extreme imbalance between them	Imbalanced classification: fraud, rare disease, defect detection
Balanced Accuracy	(Sensitivity+Specificity)/2	Average of sensitivity and specificity; treats both classes equally	Binary classification with severe class imbalance
True Positive (TP)	—	Predicted positive and actually positive; a correct hit	Counts correct detections in the positive class
False Positive (FP)	—	Predicted positive but actually negative; a Type I error	The "false alarm" cell in the confusion matrix
False Negative (FN)	—	Predicted negative but actually positive; a Type II error	The "missed detection" cell; high FN = low recall

Diagnostic Accuracy in Healthcare and Epidemiology

In healthcare, diagnostic accuracy describes how well a test separates people who have a condition from those who do not. The terminology differs slightly from machine learning: sensitivity is the clinical equivalent of recall, and specificity directly maps to the same formula used in classification.

The STARD (Standards for Reporting Diagnostic Accuracy) guidelines require that clinical studies report sensitivity, specificity, and their confidence intervals alongside overall accuracy. This is because a test with 95% accuracy but only 60% sensitivity catches too few sick patients to be clinically useful.

Table: ML Terminology vs. Clinical Testing Terminology

Concept	ML / Data Science Term	Clinical / Epidemiology Term
Correctly identified positive	True Positive (TP)	True Positive
Correctly identified negative	True Negative (TN)	True Negative
Type I Error	False Positive (FP)	False Positive
Type II Error	False Negative (FN)	False Negative
Overall correctness	Accuracy	Diagnostic Accuracy
TP / (TP + FN)	Recall	Sensitivity
TN / (TN + FP)	Specificity	Specificity
TP / (TP + FP)	Precision	Positive Predictive Value (PPV)

Continue Your Statistics Learning

Accuracy is one measure of model quality. The tools and guides below will help you build a complete understanding of statistical evaluation and hypothesis testing, all covered in depth at Statistics Fundamentals.

Hypothesis Testing Guide Confidence Interval Calculator Understanding P-Values Type I & Type II Errors A/B Test Calculator Effect Size Calculator Logistic Regression Guide Correlation Calculator Chi-Square Calculator Statistical Interpretation Guide

Frequently Asked Questions

In machine learning, accuracy is the fraction of predictions a model gets right out of all predictions made. It equals (TP + TN) / (TP + TN + FP + FN), where TP and TN are correct predictions and FP and FN are errors. It is the most intuitive metric but can be misleading on imbalanced datasets where one class dominates the data. On balanced data with roughly equal class sizes, accuracy is a reliable headline metric for model quality.

The accuracy formula is: Accuracy = (TP + TN) / (TP + TN + FP + FN). In words: the number of correct predictions (true positives plus true negatives) divided by the total number of predictions. Multiplying by 100 gives the accuracy percentage. The error rate is the complement: Error Rate = 1 − Accuracy = (FP + FN) / Total.

A good accuracy percentage depends on the problem. In general: 95–100% is excellent, 90–94% is very good, 80–89% is good, 70–79% is fair, and below 70% is poor. But these thresholds shift with context. A 99% accurate fraud detector might still miss too many fraudulent transactions to be useful if the fraud rate is 0.5%. On imbalanced datasets, always evaluate precision, recall, and F1 score alongside accuracy.

Accuracy measures overall correctness across all classes: (TP + TN) / Total. Precision measures the correctness of positive predictions only: TP / (TP + FP). A model with high accuracy can have low precision if it correctly handles many negatives but frequently produces false alarms on positive predictions. Precision is the metric to optimize when false positives carry high costs — for example, flagging a legitimate credit card transaction as fraudulent, or sending a legitimate email to the spam folder.

Accuracy covers all four cells of the confusion matrix. Recall (also called sensitivity) focuses solely on the positive class: TP / (TP + FN). It measures what fraction of actual positives the model correctly captured. High recall is critical when missing a true positive carries a severe consequence — a cancer screening test that misses 20% of tumors has low recall regardless of its overall accuracy score. In clinical testing, recall is called sensitivity and is reported alongside specificity in every diagnostic study.

Yes — this is the well-documented accuracy paradox. On a dataset where 99% of cases are negative, a classifier that predicts negative for every observation achieves 99% accuracy while being completely useless. The model catches zero true positives, yet the accuracy metric flatters it. Whenever class distributions are skewed — which is common in fraud detection, medical diagnosis, and defect detection — use balanced accuracy, F1 score, or the area under the ROC curve (AUC-ROC) as your primary evaluation metrics instead.

Balanced accuracy is the arithmetic mean of sensitivity and specificity: (Sensitivity + Specificity) / 2. It gives equal weight to both classes regardless of their relative size, making it appropriate whenever the positive and negative classes appear in very different proportions. If a test correctly identifies 80% of sick patients (sensitivity) and 90% of healthy patients (specificity), its balanced accuracy is (80% + 90%) / 2 = 85%. The scikit-learn library provides balanced_accuracy_score() for computing this directly from predicted and true labels.

Step 1: Add all four confusion matrix values to get the total: Total = TP + TN + FP + FN. Step 2: Add the two correct prediction cells: Correct = TP + TN. Step 3: Divide: Accuracy = Correct / Total. Step 4: Multiply by 100 for percentage. Example: TP = 85, TN = 90, FP = 10, FN = 15. Total = 200. Correct = 175. Accuracy = 175 / 200 = 0.875 = 87.5%. The calculator at the top of this page automates all steps and also computes precision, recall, specificity, F1 score, and balanced accuracy from the same four inputs.

The mathematics are identical. Diagnostic accuracy in medicine and classification accuracy in machine learning both use the same formula: (TP + TN) / Total. The terminology differs: clinicians say "sensitivity" for recall and "positive predictive value (PPV)" for precision, but the calculations are equivalent. Clinical studies additionally report 95% confidence intervals around accuracy, sensitivity, and specificity, which you can compute using the confidence interval calculator.

95% accuracy means the model or test correctly classified 95 out of every 100 cases. Equivalently, the error rate is 5%. Whether this is good depends on context: 95% accuracy in image recognition for a photo app is fine; 95% accuracy in an autonomous vehicle's obstacle detection system may be dangerously insufficient. Always ask: what does the 5% error look like? Are the errors false positives, false negatives, or a mix? The answer determines whether 95% accuracy is acceptable for your specific application.

Accuracy Calculator: Classification & Diagnostic Performance