BY: Statistics Fundamentals Team
Reviewed By: Minsa A (Senior Statistics Editor)

McNemar's Test Calculator

Calculate McNemar's test for paired categorical data instantly. Enter your 2×2 contingency table counts, choose a significance level, and get the chi-square statistic, continuity-corrected statistic, asymptotic p-value, exact binomial p-value, odds ratio, and a full step-by-step solution — all in your browser with no signup required.

McNemar's Test Calculator

Core Formula X²#x00B2; = (b − c)²#x00B2; / (b + c) Focus on Discordant pairs (b & c) only

Enter 2×2 contingency table counts:

After: Yes (+) After: No (−) Row Total
Before: Yes (+) Concordant (a) Discordant (b)
Before: No (−) Discordant (c) Concordant (d)
Col Total

Run the calculator on the Standard Test tab first, then return here for the full step-by-step solution.

What Is McNemar's Test?

McNemar's test is a nonparametric statistical test used to determine whether there is a significant change in proportions across two related, dependent measurements of the same binary outcome. It was introduced by psychologist Quinn McNemar in 1947 and has since become the standard method for analyzing before-and-after studies, matched-pair designs, and repeated measures on binary (yes/no, pass/fail, present/absent) data.

The test works by shifting analytical focus away from pairs where the outcome remained the same and onto the discordant pairs — subjects whose binary response changed between the two measurement points. Because concordant pairs (cells a and d in the 2×2 table) carry no information about directional change, they do not contribute to the test statistic. This design makes McNemar's test fundamentally different from the standard chi-square test of independence, which is appropriate only for independent samples. McNemar's test is the correct tool whenever the same subjects are measured twice.

According to the British Medical Journal's Statistics at Square One, McNemar's test is one of the most commonly applied tests in clinical medicine for evaluating treatment effects measured as paired binary outcomes.

The 2×2 Paired Contingency Table

McNemar's test organizes paired binary data into a 2×2 contingency table where rows represent the state at Time 1 (Before) and columns represent the state at Time 2 (After). The four cells track every possible outcome path for each matched subject.

After: Yes (+)After: No (−)Row Total
Before: Yes (+) a — Yes→#x2192;Yes (Concordant) b — Yes→#x2192;No (Discordant) a + b
Before: No (−) c — No→#x2192;Yes (Discordant) d — No→#x2192;No (Concordant) c + d
Column Total a + c b + d N = a + b + c + d

Cells b and c are the discordant pairs. Cell b contains subjects who changed from positive to negative (e.g., patients who were symptomatic before treatment but asymptomatic afterward). Cell c contains subjects who changed from negative to positive (e.g., patients who were asymptomatic but became symptomatic). If the treatment or intervention had no effect on the population, you would expect b and c to be roughly equal. A substantial imbalance between b and c is the evidence McNemar's test quantifies.

McNemar's Test Formulas

There are three versions of McNemar's test. The standard formula uses the chi-square approximation, the continuity-corrected version adjusts for discrete data, and the exact binomial version is used when the total discordant pair count is small (b + c < 25).

Standard McNemar Chi-Square

X²#x00B2; = (b − c)²#x00B2; / (b + c) df = 1 Use when: b + c ≥ 25

Edwards' Continuity Correction

X²#x00B2; = (|b − c| − 1)²#x00B2; / (b + c) df = 1 Use when: b + c ≈#x2248; 20–25

Exact Binomial McNemar

b ~ Binomial(b+c, p=0.5) Two-tailed p-value from cumulative binomial CDF Use when: b + c < 25

Odds Ratio (Effect Size)

ψ#x03C8; = b / c Measures magnitude of shift ψ#x03C8; > 1: more shifted Yes→#x2192;No ψ#x03C8; < 1: more shifted No→#x2192;Yes

The chi-square statistic follows a distribution with 1 degree of freedom. The p-value answers this question: if there were genuinely no difference in the proportion of "Yes" responses before and after, how likely would we be to observe a discordant pair imbalance this large by chance alone? Penn State's STAT 415 course materials cover the derivation of the McNemar statistic from a paired proportion framework.

How to Perform McNemar's Test — Step by Step

To perform McNemar's test: organize your paired data into a 2×2 table, identify the discordant pair counts b and c, check whether to use the chi-square approximation or the exact binomial test, compute the statistic, find the p-value, and compare it to your significance threshold.

1
Build the 2×2 paired contingency table

Assign each subject (or matched pair) to one of the four cells based on their outcomes at Time 1 (rows) and Time 2 (columns). Every subject must appear in exactly one cell. Count totals for cells a, b, c, and d.

2
Identify the discordant pairs (b and c)

Cell b = subjects who changed from Yes to No. Cell c = subjects who changed from No to Yes. Calculate b + c. Concordant pairs (a and d) do not affect the test statistic.

3
Choose the correct test version

If b + c ≥ 25, use the standard chi-square formula. If b + c falls between 20 and 25, apply Edwards' continuity correction. If b + c < 25, use the exact binomial test.

4
Compute the test statistic

Standard: X²#x00B2; = (b − c)²#x00B2; / (b + c). Corrected: X²#x00B2; = (|b − c| − 1)²#x00B2; / (b + c). Both follow a chi-square distribution with df = 1.

5
Determine the p-value

Look up your X²#x00B2; value in the chi-square distribution table with df = 1, or use the calculator above. For the exact test, compute the two-tailed probability from the binomial distribution with n = b + c and p = 0.5.

6
Make the statistical decision

If p-value < α (e.g., 0.05): reject H₀#x2080;. Conclude a statistically significant shift occurred in the binary outcome between the two conditions. If p-value ≥ α: fail to reject H₀#x2080; — insufficient evidence of a change.

Assumptions of McNemar's Test

McNemar's test requires four conditions to hold. When these assumptions are violated, the test results are unreliable.

1. Paired / dependent observations: Each row of data must represent a matched pair — either the same subject measured twice, or two subjects matched on relevant characteristics. Using McNemar's test on independent samples is incorrect; use the standard chi-square test instead.
2. Binary (dichotomous) outcome: The response variable must have exactly two categories (Yes/No, Pass/Fail, Present/Absent, Success/Failure). For categorical variables with three or more levels, use the Stuart-Maxwell test, which generalizes McNemar's test to larger tables.
3. Random sampling of matched pairs: The pairs should be drawn from the population using a probability-based sampling scheme. Convenience samples can produce biased estimates of the true proportion change.
4. Mutually exclusive outcomes: Each subject can be in only one cell of the 2×2 table. A subject cannot simultaneously be classified as both "Yes" and "No" at either measurement point.

📊 Worked Case Studies

Case Study 1 — Pharmaceutical Efficacy (Sleep Drug Trial)

Scenario: Researchers evaluate 100 patients for insomnia before and after a targeted sleep medication. Each patient is classified as symptomatic or asymptomatic at baseline and again at follow-up.
After: AsymptomaticAfter: Symptomatic
Before: Symptomatica = 45b = 40
Before: Asymptomaticc = 3d = 12
Discordant pairs

b = 40 (symptomatic →#x2192; asymptomatic), c = 3 (asymptomatic →#x2192; symptomatic). Total b + c = 43 ≥ 25, so use standard formula.

Chi-square statistic

X²#x00B2; = (40 − 3)²#x00B2; / (40 + 3) = 1369 / 43 = 31.84

P-value

With df = 1, X²#x00B2; = 31.84 gives p < 0.0001. The critical value at α = 0.05 is 3.841.

Odds ratio

ψ#x03C8; = b / c = 40 / 3 = 13.33. The odds of shifting from symptomatic to asymptomatic are over 13 times the odds of the reverse.

Decision: Reject H₀#x2080; (p < 0.0001). There is a statistically significant reduction in insomnia symptoms after treatment. This design mirrors standard before-after clinical trial analysis.

Case Study 2 — AI Diagnostic vs. Standard Lab Assay

Scenario: A hospital compares a new AI diagnostic classifier against an established blood biomarker assay for the same 200 patients. Both produce binary positive/negative results. Researchers want to know whether classification performance differs significantly between methods.
Bio-Assay: PositiveBio-Assay: Negative
AI Classifier: Positivea = 110b = 8
AI Classifier: Negativec = 22d = 60
Discordant pairs

b = 8, c = 22. Total b + c = 30 ≥ 25, so the standard formula applies. Continuity correction is also reasonable here.

Standard X²#x00B2;

X²#x00B2; = (8 − 22)²#x00B2; / (8 + 22) = 196 / 30 = 6.53

Corrected X²#x00B2;

X²#x00B2; = (|8 − 22| − 1)²#x00B2; / 30 = (13)²#x00B2; / 30 = 169 / 30 = 5.63

P-value

Standard: p ≈#x2248; 0.011. Corrected: p ≈#x2248; 0.018. Both are below α = 0.05.

Decision: Reject H₀#x2080;. The two diagnostic methods classify patients differently at the population level. The bio-assay labels significantly more patients positive than the AI classifier. This application of McNemar's test in diagnostic medicine is described in Fagerland, Lydersen & Laake (2013) in BMC Medical Research Methodology.

Case Study 3 — Political Science: Impact of a Televised Debate

Scenario: 80 voters are surveyed on candidate support (Support vs. Oppose) before and after watching a televised policy debate. The researcher asks whether the debate shifted opinion significantly.
Post-Debate: SupportPost-Debate: Oppose
Pre-Debate: Supporta = 30b = 15
Pre-Debate: Opposec = 20d = 15
Discordant pairs

b = 15, c = 20. Total b + c = 35 ≥ 25. Standard formula applies.

Chi-square statistic

X²#x00B2; = (15 − 20)²#x00B2; / (15 + 20) = 25 / 35 = 0.714

P-value

p ≈#x2248; 0.398. This is well above α = 0.05.

Decision: Fail to reject H₀#x2080; (p = 0.398). The debate did not produce a statistically significant shift in overall voter support. 15 voters changed to oppose and 20 changed to support — a difference that could easily arise by chance.

McNemar's Test vs. Related Statistical Tests

Choosing McNemar's test over a similar test depends on sample dependency, data type, and the number of outcome categories. The table below clarifies the decision for four commonly confused tests.

ParameterMcNemar's TestChi-Square Test (2×2)Paired T-TestFisher's Exact Test
Sample DependencyDependent / PairedIndependentDependent / PairedIndependent
Data TypeCategorical (Binary)Categorical (Nominal)Continuous (Interval/Ratio)Categorical (Binary)
Core TargetMarginal homogeneity (proportion change)Independence / AssociationMean differenceIndependence (small N)
Table Structure2×2 pairedAny r×c independentDifference scores2×2 independent
N Requirementsb+c ≥ 25 for χ#x03C7;²#x00B2; approximationExpected counts ≥ 5 in each cellNormality of differences (or large n)No minimum; exact for any n
Common error: Applying the standard chi-square test to paired before-and-after data is a methodological mistake. It treats paired measurements as independent observations, which artificially inflates degrees of freedom and can produce misleading p-values. If your data come from the same subjects measured at two time points, McNemar's test is the appropriate choice.

How to Interpret McNemar's Test Results

Interpreting a McNemar's test result requires reading both the p-value and the direction of the discordant pairs. The p-value tells you whether a significant shift occurred; the ratio b/c (or c/b) tells you the direction and magnitude.

When p < α (reject H₀#x2080;): Conclude that the proportion of "Yes" responses differs significantly between the two time points. The intervention produced a measurable, systematic change in the population. Report the odds ratio (ψ#x03C8; = b/c) as the effect size. If b > c, more subjects shifted from positive to negative; if c > b, the reverse is true.
When p ≥ α (fail to reject H₀#x2080;): The data do not provide sufficient evidence to conclude that a systematic change occurred. Note that this is not proof of no effect — it could reflect a real but small effect that this sample size lacks the power to detect.
Reporting convention: State the chi-square statistic, degrees of freedom, and exact p-value: e.g., "McNemar's test indicated a significant change in symptom status (χ#x03C7;²#x00B2; = 31.84, df = 1, p < .001, OR = 13.33)."

McNemar's Test in R and Python

Both R and Python have built-in functions for McNemar's test. The examples below reproduce Case Study 1 (the sleep drug trial).

R — Base Function

# McNemar's test in R data_matrix <- matrix(c(45, 40, 3, 12), nrow = 2, dimnames = list( "Before" = c("Symptomatic", "Asymptomatic"), "After" = c("Asymptomatic", "Symptomatic") )) # With continuity correction (default) mcnemar.test(data_matrix, correct = TRUE) # Without continuity correction mcnemar.test(data_matrix, correct = FALSE) # Output: McNemar's chi-squared = 31.837, df = 1, p-value = 1.66e-08

Python — SciPy

from scipy.stats import mcnemar # 2x2 contingency table [[a, b], [c, d]] table = [[45, 40], [3, 12]] # Exact binomial test (recommended for small b+c) result_exact = mcnemar(table, exact=True) print(f"Exact p-value: {result_exact.pvalue:.6f}") # Chi-square approximation with continuity correction result_approx = mcnemar(table, exact=False, correction=True) print(f"Corrected X^2: {result_approx.statistic:.4f}") print(f"Asymptotic p: {result_approx.pvalue:.6f}")

McNemar's Test: Complete Formula and Entity Reference

The table below covers every key formula, symbol, and concept associated with McNemar's test, structured for quick reference and direct extraction by search engines and AI language models.

Term / SymbolFormula / ValueDefinitionContext
McNemar's TestX²#x00B2; = (b−c)²#x00B2;/(b+c)Nonparametric test for marginal homogeneity in paired binary data.Before-after studies, matched pairs
Discordant pairsb + cCount of subjects whose outcome changed between measurement points. The only cells that drive the statistic.Core of the McNemar calculation
Concordant pairsa + dCount of subjects whose outcome did not change. These cells do not affect the test statistic.2×2 table structure
H₀#x2080; (null hypothesis)P(b) = P(c)Marginal homogeneity: the probability of a positive outcome is the same at both time points.Hypothesis testing framework
Edwards' Correction(|b−c|−1)²#x00B2;/(b+c)Continuity correction that subtracts 1 from the absolute discordant difference to reduce type I error inflation with small samples.When b+c is between 20 and 25
Exact McNemarBinomial(b+c, 0.5)Exact test based on the binomial distribution; the definitive method when b + c < 25.Small sample sizes
Odds Ratio (ψ#x03C8;)ψ#x03C8; = b / cEffect size measure. Values above 1 indicate more subjects shifted from positive to negative than the reverse.Effect size reporting
Degrees of Freedomdf = 1McNemar's test always uses 1 degree of freedom, regardless of sample size.Chi-square distribution lookup
Critical X²#x00B2; (α=0.05)3.841The threshold value for df = 1 at a 5% significance level. Reject H₀#x2080; if X²#x00B2; > 3.841.Decision rule
Marginal homogeneityP(row margin) = P(col margin)The property tested by McNemar's test: whether the proportion of positive outcomes is the same before and after.Null hypothesis definition

Related Tools and Guides on Statistics Fundamentals

McNemar's test sits at the intersection of categorical data analysis and paired study design. These resources build the complete picture.

Sources and Further Reading

Authority sources cited in this guide:

  • McNemar, Q. (1947). "Note on the sampling error of the difference between correlated proportions or percentages." Psychometrika, 12(2), 153–157. link.springer.com
  • Fagerland, M.W., Lydersen, S., & Laake, P. (2013). "The McNemar test for binary matched-pairs data: mid-p and asymptotic are better than exact conditional." BMC Medical Research Methodology, 13, 91. ncbi.nlm.nih.gov
  • Campbell, M.J. & Swinscow, T.D.V. Statistics at Square One, 11th ed. BMJ Publishing Group. bmj.com
  • Penn State STAT 415. Introduction to Mathematical Statistics. online.stat.psu.edu
  • Agresti, A. (2013). Categorical Data Analysis, 3rd ed. Wiley. — Chapter 10: Analyzing Dependent Proportions.
  • SciPy Documentation. scipy.stats.mcnemar
  • R Documentation. mcnemar.test {stats}

Frequently Asked Questions

McNemar's test is used to determine whether there is a statistically significant change in a binary (yes/no) outcome measured at two time points or under two conditions on the same subjects or matched pairs. Common applications include clinical before-and-after studies, diagnostic test comparisons, electoral opinion tracking after debates, and screening program evaluations. It is the standard nonparametric test for paired binary data, introduced by Quinn McNemar in 1947.

The standard McNemar chi-square formula is X²#x00B2; = (b − c)²#x00B2; / (b + c), where b is the count of subjects who changed from Yes to No and c is the count who changed from No to Yes. This statistic follows a chi-square distribution with 1 degree of freedom. For small samples (b + c < 25), the exact binomial test is used instead. The continuity-corrected version is X²#x00B2; = (|b − c| − 1)²#x00B2; / (b + c).

Discordant pairs are subjects whose binary outcome changed between the two measurement points. In the 2×2 contingency table, they occupy cells b (changed from Yes to No) and c (changed from No to Yes). Only these cells contribute to the McNemar test statistic. Concordant pairs — cells a (Yes to Yes) and d (No to No) — carry no information about directional change and do not affect the calculation.

McNemar's exact test uses the binomial distribution rather than the chi-square approximation to compute the p-value. It is the correct method when the total number of discordant pairs (b + c) is less than 25, because the chi-square approximation becomes unreliable with sparse discordant pair counts. The exact test models b as a binomial random variable with n = b + c trials and a null probability p = 0.5, then computes a two-tailed probability from the cumulative binomial CDF.

The null hypothesis (H₀#x2080;) in McNemar's test is marginal homogeneity: the probability of a "Yes" outcome is identical at both time points. Equivalently, P(b) = P(c), meaning the expected number of subjects shifting from Yes to No equals the expected number shifting from No to Yes. Rejecting H₀#x2080; means the proportion of positive responses changed significantly between measurements, which is evidence that the intervention or passage of time produced a real shift in the population.

McNemar's test is for dependent (paired) samples where the same subjects are measured twice or subjects are matched in pairs. The standard chi-square test of independence is for independent samples from separate groups. Applying the chi-square test to paired data is a methodological error: it ignores the dependency structure, inflates degrees of freedom, and can produce misleading significance conclusions. If the same individuals appear in both rows and columns of your table, McNemar's test is the correct choice.

Edwards' continuity correction — which subtracts 1 from the absolute difference |b − c| before squaring — is recommended when the total discordant pair count (b + c) is in the range of roughly 20 to 25. For larger samples, the chi-square approximation is reliable without correction. For b + c below 25, the exact binomial test is the preferred approach. Many statisticians and textbooks recommend always using the exact test when possible, as it provides exact probability statements rather than approximations.

No. The standard McNemar's test is strictly limited to 2×2 binary outcomes. When the categorical variable has three or more levels (for example, Improved / Same / Worsened), the appropriate generalization is the Stuart-Maxwell test, which tests for marginal homogeneity in larger square contingency tables. The Bowker test of symmetry is another alternative for multi-level paired categorical data.