What is the Bonferroni correction formula?

The Bonferroni correction divides the family-wise significance level by the number of comparisons: α_adj = α_family / m. For example, with α = 0.05 and m = 10 comparisons, each individual test uses α_adj = 0.005. The adjusted two-tailed z critical value is then InvNorm(1 − 0.005/2) = InvNorm(0.9975) = 2.807.

How do you use the Bonferroni critical value table?

Step 1: Decide on the family-wise significance level (α, typically 0.05). Step 2: Count the total number of planned comparisons (m). Step 3: Locate the row for m in the table. Step 4: Read the adjusted alpha and the two-tailed z critical value. Step 5: For each comparison, reject H₀ only if |z| ≥ z_crit (or p_raw ≤ α_adj).

Why is the Bonferroni correction considered conservative?

The Bonferroni correction uses Boole's inequality, which bounds FWER ≤ m × α_adj regardless of the dependencies among tests. When tests are correlated, the actual FWER is lower than the bound, so the correction over-penalises individual tests. This reduces statistical power — the ability to detect true effects — more than necessary. Sequential alternatives like the Holm-Bonferroni method achieve the same FWER control with greater power by processing tests from smallest to largest p-value.

What is family-wise error rate (FWER)?

The family-wise error rate (FWER) is the probability of making at least one Type I error (false positive) among a set of m hypothesis tests. When m independent tests are each run at α = 0.05, FWER = 1 − (1 − 0.05)^m. With m = 10, FWER ≈ 0.40. The Bonferroni correction holds FWER ≤ α by requiring each test to pass a stricter threshold α_adj = α/m.

When should I use Bonferroni correction instead of FDR?

Use Bonferroni (or Holm) when strict control of the probability of any false positive is required — such as in confirmatory clinical trials with pre-specified primary endpoints, or small-scale experiments with m < 10 comparisons. Use FDR (Benjamini-Hochberg) for exploratory, high-throughput analyses like genomic screening or neuroimaging, where some false positives are acceptable and preserving discovery power matters more than guaranteeing zero false positives.

How do you adjust a p-value for Bonferroni?

To Bonferroni-adjust a raw p-value, multiply it by m: p_adj = p_raw × m. Cap the result at 1. Then compare p_adj against the original α. This is equivalent to comparing p_raw against α_adj = α/m. For example, with m = 6 and p_raw = 0.012: p_adj = 0.012 × 6 = 0.072. Since 0.072 > 0.05, the result is not significant after correction.

Can I use Bonferroni correction after ANOVA?

Yes. After a significant omnibus ANOVA result, post-hoc pairwise t-tests are conducted. With k groups, there are m = k(k−1)/2 pairwise comparisons. Apply Bonferroni by setting α_adj = 0.05/m and comparing each pairwise p-value against this threshold. For k = 4 groups, m = 6, so α_adj = 0.0083. Note that Tukey's HSD is specifically optimised for all-pairwise ANOVA comparisons and typically has more power than Bonferroni in this context.

What is the difference between Bonferroni adjusted alpha and Bonferroni adjusted p-value?

Bonferroni adjusted alpha: α_adj = α/m (you divide the threshold, then compare raw p-values against it). Bonferroni adjusted p-value: p_adj = p_raw × m (you multiply each p-value, then compare against original α). Both approaches lead to exactly the same decision but from opposite directions. R's p.adjust(..., method='bonferroni') produces adjusted p-values; comparing them against 0.05 gives the same result as comparing raw p-values against α/m.

Does sample size affect which m value to use in Bonferroni correction?

No. m is the number of hypothesis tests planned, not sample size. A study with n = 500 participants running 10 tests still uses m = 10. Sample size affects the power of each individual test (via the standard error), not the Bonferroni adjustment factor. For t-tests, use df = n₁ + n₂ − 2 to find the t critical value after applying Bonferroni to set α_adj.

Bonferroni-Adjusted Critical Value Table & Alpha Correction Guide

Bonferroni-Adjusted Alpha & Critical Value Calculator

Number of Comparisons (m)

Family-Wise α Level

Test Direction

Adjusted Alpha (α_adj)

—

z Critical Value

—

What Is a Bonferroni-Adjusted Critical Value?

When you run m hypothesis tests using the same dataset or within the same study, the probability of getting at least one false positive grows with each additional test — even when all null hypotheses are true. The Bonferroni correction addresses this by lowering the significance threshold for each individual test.

The adjusted alpha is simply α_adj = α_family / m. The adjusted critical value is the two-tailed z score corresponding to this stricter threshold. A test statistic must exceed this higher bar to be counted as significant under the corrected framework.

Key distinction: α_adj is what you compare each test's p-value against. The Bonferroni z_crit is what you compare each test statistic against. Both lead to identical decisions. The choice between them is a matter of what your software reports.

Bonferroni-Adjusted Critical Value Table

Each row gives α_adj = α_family/m and the two-tailed z critical value for that adjusted threshold. Select the tab for your chosen family-wise α. Click any row to load its values into the calculator.

Two-tailed z_crit = InvNorm(1 − α_adj/2). Reject H₀ for comparison i if |z_i| ≥ z_crit, or equivalently if p_i,raw ≤ α_adj. Values verified against the standard normal distribution. m = total number of simultaneous hypothesis tests in the family.

How to Use the Bonferroni Critical Value Table

The procedure is direct. Each step has a single statistical decision attached to it.

Step 1 — Set Family-Wise α Before Data Collection

Decide on α_family as part of your study design, not after seeing the results. The standard in most fields is α = 0.05. Clinical trials often use α = 0.01 for secondary endpoints; genomic studies sometimes use α = 0.05/genome-wide SNP count ≈ 5 × 10⁻⁸.

Step 2 — Count Every Planned Comparison (m)

List each null hypothesis you will test. Count them. If you have k groups in a one-way ANOVA and want all pairwise tests, m = k(k−1)/2:

k=3 groups → m = 3(3−1)/2 = 3 pairs
k=4 groups → m = 4(4−1)/2 = 6 pairs
k=5 groups → m = 5(5−1)/2 = 10 pairs

Step 3 — Read α_adj and z_crit from the Table

Find the row for your m in the table above. The second column gives α_adj = α/m. The third column gives the two-tailed z critical value corresponding to that adjusted threshold. If your exact m is not listed, use the next larger m (conservative) or enter your exact m in the calculator.

Step 4 — Apply the Decision Rule

If |z_obs| ≥ z_crit → Reject H₀ → Significant after Bonferroni correction
If |z_obs| < z_crit → Fail to reject H₀ → Not significant after correction

Equivalently: reject H₀ if p_raw ≤ α_adj. The two rules are identical because p_raw = P(|Z| ≥ |z_obs|) under H₀.

Step 5 — Report Results with Correction Stated

Always state m, α_family, and α_adj in your methods section. APA format example: "To control the family-wise error rate across m = 6 pairwise comparisons, a Bonferroni correction was applied (α_adj = 0.008). Comparison 2 reached the corrected threshold: z = 2.79, p = 0.005 < 0.008."

The Multiple Comparisons Problem & Bonferroni Formula

Running m independent tests at the same α level means the chance of at least one false positive climbs sharply. The Bonferroni correction counteracts this inflation by tightening the per-test threshold.

Type I Error Inflation Without Correction

For m independent tests each at nominal α = 0.05, the family-wise error rate is:

FWER = 1 − (1 − α)^m

FWER at α = 0.05 by number of tests:

m=1

FWER = 5.0%

m=3

FWER = 14.3%

m=5

FWER = 22.6%

m=10

FWER = 40.1%

m=20

FWER = 64.2%

The Bonferroni Adjustment Formula

By Boole's inequality, FWER ≤ m × α_adj for any test configuration (independent or dependent). Setting m × α_adj = α_family gives:

α_adj = α_family / m

The corrected two-tailed z critical value is then: z_crit = InvNorm(1 − α_adj/2). For one-tailed tests: z_crit = InvNorm(1 − α_adj).

Bonferroni-Adjusted p-Values

Some software (including R's p.adjust()) reports Bonferroni-adjusted p-values rather than adjusted alphas. The two approaches are equivalent:

p_adj = min(p_raw × m, 1.0)
Compare p_adj against α_family (e.g. 0.05)

Dividing α by m (threshold approach) and multiplying p by m (adjusted p-value approach) always produce the same significance decision.

Multi-Alpha Reference Matrix

Adjusted alpha values and two-tailed z critical values across all three standard family-wise significance levels for the most commonly used comparison counts.

m	α_adj (α=0.10)	z_crit (α=0.10)	α_adj (α=0.05)	z_crit (α=0.05)	α_adj (α=0.01)	z_crit (α=0.01)

Two-tailed z_crit. α_adj shown to six decimal places. z_crit = InvNorm(1 − α_adj/2).

Worked Examples Across Seven Research Contexts

Each example below shows how to identify m, compute α_adj, and find the corrected critical value for a specific research scenario.

Example 1 — Multi-Arm Clinical Trial (Oncology)

Setup: A drug trial tests against control on five endpoints: overall survival, progression-free survival, objective response rate, toxicity grade, and quality-of-life score. α_family = 0.05.

m = 5 → α_adj = 0.05/5 = 0.0100 → z_crit = 2.576

Reporting: "A Bonferroni correction was applied across m = 5 primary endpoints (α_adj = 0.010). Endpoint 1 (OS): z = 2.74, p = 0.006 < 0.010 — significant. Endpoint 3 (ORR): z = 2.31, p = 0.021 > 0.010 — not significant after correction."

Example 2 — Post-Hoc ANOVA (Psychology, 4 Groups)

Setup: A study compares cognitive performance across four sleep deprivation conditions (0 h, 12 h, 24 h, 48 h awake). A significant one-way ANOVA triggers pairwise t-tests.

m = k(k−1)/2 = 4(3)/2 = 6 → α_adj = 0.05/6 ≈ 0.00833 → z_crit = 2.638

Decision rule: Each pairwise t-test rejects H₀ only if |t| ≥ 2.638 (or use the t-distribution with the appropriate df for small n). With n = 15 per group, df = 28 per comparison, the Bonferroni t_crit from the t-table at α = 0.00833 and df = 28 is approximately 2.73.

Example 3 — Genomic SNP Association (Bioinformatics)

Setup: A candidate-gene study tests 100 SNPs for association with rheumatoid arthritis. α_family = 0.05.

m = 100 → α_adj = 0.05/100 = 0.000500 → z_crit = 3.481

Note: Genome-wide association studies (GWAS) often test millions of SNPs. For m = 1,000,000 the Bonferroni threshold is approximately z_crit = 5.45, equivalent to p < 5 × 10⁻⁸. This is why GWAS discoveries require very large samples and replication cohorts.

Example 4 — Psychometric Assessment (3 Scales × 2 Timepoints)

Setup: Evaluates three psychometric outcomes (anxiety, depression, QoL) at two timepoints (post-treatment and 6-month follow-up), giving m = 6 planned tests.

m = 6 → α_adj = 0.05/6 ≈ 0.00833 → z_crit = 2.638

Example 5 — Industrial Quality Control (8 Machine Outputs)

Setup: Monitoring physical tolerances across 8 concurrent drill-press outputs against a target spec. α_family = 0.01 (strict manufacturing standard).

m = 8 → α_adj = 0.01/8 = 0.00125 → z_crit = 3.227

Example 6 — A/B/n Testing (4 Checkout Variants)

Setup: An e-commerce platform compares 4 new checkout designs against the current control. α_family = 0.05.

m = 4 → α_adj = 0.05/4 = 0.01250 → z_crit = 2.498

Practical note: In A/B testing, traffic is typically split evenly. With n = 10,000 users per variant and a baseline conversion rate of 5%, the minimum detectable effect at this corrected threshold requires roughly 14% relative lift instead of the uncorrected 10%.

Example 7 — Neuroimaging Regions of Interest (20 Structures)

Setup: An fMRI study tests grey matter volume differences across 20 cortical regions between patients and controls. α_family = 0.05.

m = 20 → α_adj = 0.05/20 = 0.00250 → z_crit = 3.023

Alternative: For whole-brain voxel-wise analyses with thousands of voxels, FDR (Benjamini-Hochberg) is typically preferred over Bonferroni because it has substantially higher power when many true effects are present across the brain.

Bonferroni vs. Alternative Multiple Testing Corrections

Selecting the right correction depends on your error control objective, the number of tests, and how correlated those tests are. The table below maps each method to its primary use case.

Procedure	Error Control	Power	Best For	Key Limitation
Bonferroni	FWER	Low — most conservative	Small m (<10), confirmatory endpoints, independent tests	Over-conservative with correlated tests or large m
Holm (Step-Down)	FWER	Moderate — always ≥ Bonferroni	Same FWER guarantee as Bonferroni, any m	Slightly more complex; requires sorted p-values
Šidák	FWER	Marginally higher than Bonferroni	Independent orthogonal contrasts	Assumes test independence; fails for correlated tests
Tukey HSD	FWER	High for all-pairwise ANOVA	All pairwise comparisons of group means after ANOVA	Only valid for balanced ANOVA post-hoc; not general
Benjamini-Hochberg (FDR)	FDR (expected false discovery proportion)	High — especially for large m	Genomics, neuroimaging, large-scale screening	Allows some false positives; requires replication
Benjamini-Yekutieli (FDR)	FDR — valid under any dependence	Lower than BH	Correlated tests where BH assumptions may fail	More conservative than standard BH; rarely needed

When Holm Is Preferable to Bonferroni

Holm's step-down method tests hypotheses in order from smallest to largest p-value. It uses a stricter threshold for the most significant result (same as Bonferroni) but a progressively less strict threshold for subsequent tests. The result is the same FWER guarantee as Bonferroni but strictly more power — there is no situation where Bonferroni detects something Holm misses, but Holm can detect effects Bonferroni misses. For this reason, Holm is the standard recommendation in most statistical guidelines when FWER control is needed. See the hypothesis testing guide for a full comparison.

Common Mistakes in Bonferroni Correction

⚠️

Pooling Unrelated Tests

Combining tests from separate studies or unrelated research questions into one family inflates m needlessly, destroying power for all tests. Only tests addressing the same scientific question should share a correction family.

⚠️

Multiplying α Instead of Dividing

α_adj = α/m (divide α to get threshold). The Bonferroni adjusted p-value = p × m (multiply p). Confusing the two leads to wrong conclusions. Dividing the p-value or multiplying α are both errors.

⚠️

Post-Hoc Selection of m

Setting m after seeing which tests were significant — to shrink the correction — is a form of p-hacking. The number of comparisons must be determined by the study design and pre-registered, not by results.

⚠️

Applying Bonferroni to Subgroups Separately

If a study tests the same hypothesis in 5 subgroups plus the full sample, all 6 tests belong to the same family (m = 6). Applying Bonferroni within each subgroup separately (m = 1 per subgroup) violates the family-wise logic.

Software Implementation

Both R and Python compute Bonferroni-adjusted p-values and critical values directly. Paste the snippets below into your analysis script.

R — Using `p.adjust()`

# Vector of raw p-values from multiple tests

raw_p <- c(0.004, 0.012, 0.031, 0.045, 0.121, 0.550)

# Bonferroni-adjusted p-values (multiply by m, capped at 1)

adj_p <- p.adjust(raw_p, method = "bonferroni")

print(adj_p)

# [1] 0.024 0.072 0.186 0.270 0.726 1.000

# Alternative methods available: "holm", "hochberg", "BH", "BY"

holm_p <- p.adjust(raw_p, method = "holm")

Python — Using `statsmodels`

from statsmodels.stats.multitest import multipletests

raw_p = [0.004, 0.012, 0.031, 0.045, 0.121, 0.550]

# Bonferroni correction

reject, adj_p, _, _ = multipletests(raw_p, alpha=0.05, method='bonferroni')

print(f"Adjusted p-values: {adj_p}")

print(f"Rejected H0: {reject}")

# Holm–Bonferroni (more power, same FWER guarantee)

reject_holm, adj_holm, _, _ = multipletests(raw_p, alpha=0.05, method='holm')

Computing z_crit for Any m and α in Python

from scipy.stats import norm

import numpy as np

def bonferroni_zcrit(alpha_family, m, tails=2):

    alpha_adj = alpha_family / m

    if tails == 2:

        return norm.ppf(1 - alpha_adj / 2)

    return norm.ppf(1 - alpha_adj)

print(bonferroni_zcrit(0.05, 10))  # → 2.807

print(bonferroni_zcrit(0.05, 20))  # → 3.023

Frequently Asked Questions

What is a Bonferroni-adjusted critical value?

A Bonferroni-adjusted critical value is the modified z (or t) threshold applied to each of m simultaneous hypothesis tests to hold the family-wise error rate at α. It equals InvNorm(1 − α_adj/2) for two-tailed tests, where α_adj = α/m. Every comparison must clear this higher bar to be counted as significant.

Why is Bonferroni called conservative?

The correction relies on Boole's inequality, which bounds FWER ≤ Σα_adj for any correlation structure among tests. When tests are positively correlated — which is typical in most datasets — the actual FWER is lower than this bound, meaning the correction requires a stricter threshold than necessary. The Holm and Šidák methods improve on this while keeping the same theoretical FWER guarantee.

Can Bonferroni be applied to t-tests instead of z-tests?

Yes. For t-tests with finite sample sizes, use the t_crit from the t-distribution table at α_adj/2 and the relevant degrees of freedom, rather than the z table. The z_crit values in this table are valid approximations when n is large (>30 per group). For small samples, always use the t-distribution.

How many comparisons require Bonferroni correction?

There is no fixed cutoff — in principle, any m ≥ 2 tests that share a decision-making family should be considered. In practice, many researchers apply Bonferroni from m = 3 onward, since with m = 2 the uncorrected FWER at α = 0.05 is only ~9.75%, which some consider acceptable. For m = 1, no correction is needed (α_adj = α). The question of which tests belong in the same family is a scientific judgment, not a statistical one.

What does it mean if my result is not significant after Bonferroni?

A result that reaches p < 0.05 but not p < α_adj does not mean the effect is absent — only that the evidence is insufficient to declare significance under the stricter family-wise threshold. This is especially common with small m and borderline p-values. In such cases, reporting the uncorrected p, the effect size, and a confidence interval gives readers the full picture and allows them to assess practical importance independently of the correction.

How do Bonferroni-adjusted confidence intervals work?

For m simultaneous confidence intervals with a family-wise coverage of (1 − α), each individual interval uses (1 − α_adj) = (1 − α/m) as its nominal confidence level. The critical value in the CI formula becomes z_crit from the Bonferroni table. The resulting set of intervals collectively contains all true parameter values with probability ≥ 1 − α. For example, with α = 0.05 and m = 5, each CI is a 99% interval (α_adj = 0.01), and together they form a 95% simultaneous confidence region.

References & Further Reading

Bonferroni, C. E. (1936). Teoria statistica delle classi e calcolo delle probabilità. Pubblicazioni del R Istituto Superiore di Scienze Economiche e Commerciali di Firenze, 8, 3–62. The original paper establishing the inequality underlying the correction.

Holm, S. (1979). A simple sequentially rejective multiple test procedure. Scandinavian Journal of Statistics, 6(2), 65–70. The step-down extension that improves power while keeping the same FWER guarantee.

Benjamini, Y., & Hochberg, Y. (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society: Series B, 57(1), 289–300. Introduced the FDR concept and the BH procedure. doi:10.1111/j.2517-6161.1995.tb02031.x

NIST/SEMATECH e-Handbook of Statistical Methods (2013). Section 7.3.6: Multiple comparisons. National Institute of Standards and Technology. itl.nist.gov

Westfall, P. H., & Young, S. S. (1993). Resampling-Based Multiple Testing: Examples and Methods for p-Value Adjustment. Wiley. Comprehensive treatment of FWER and FDR methods with worked examples across biomedical research contexts.

Related Statistical Tables & Resources

🎓

Hypothesis Testing Full significance testing framework

📋

T-Distribution Table t critical values for small n

📊

Z-Table Standard normal critical values

📐

ANOVA Guide Post-hoc testing after ANOVA

🧮

T-Test Calculator Compute t and p from raw data

🏠

Statistics Fundamentals Tables, calculators & complete guides

How to Use This Table

1Set your α_family (usually 0.05)

2Count all planned tests → m

3Read α_adj and z_crit from table

4Per comparison: reject H₀ if p_raw ≤ α_adj

5Report m, α_adj, and FWER level

Key Formulas

α_adj = α_family / m
z_crit = InvNorm(1 − α_adj/2)
FWER ≤ m × α_adj = α
p_adj = min(p_raw × m, 1)

Quick Reference (α = 0.05)

m=2 α_adj=0.0250 · z=2.241

m=3 α_adj=0.0167 · z=2.394

m=5 α_adj=0.0100 · z=2.576

m=6 α_adj=0.0083 · z=2.638

m=10 α_adj=0.0050 · z=2.807

m=20 α_adj=0.0025 · z=3.023

m=50 α_adj=0.0010 · z=3.291

Statistics Fundamentals

Tables, calculators & complete guides

More Tables

→ Z-Table (Standard Normal) → T-Distribution Table → F-Table (ANOVA) → Chi-Square Table → Pearson Correlation Table → Tukey's Q Table

Related Calculators

→ T-Test Calculator → Chi-Square Calculator → ANOVA Calculator → Z-Score Calculator → Effect Size Calculator

Understanding What Bonferroni Correction Does and Does Not Guarantee

What It Guarantees

Bonferroni guarantees that the probability of at least one false positive among all m tests is ≤ α_family. This holds regardless of the correlation structure among the tests, because Boole's inequality does not require independence. It is a strong, unconditional bound on FWER.

What It Does Not Guarantee

Bonferroni does not control the false discovery rate (FDR) — the proportion of significant results that are false positives. With 100 tests and 50 true effects, a significant Bonferroni result is very likely a true positive, but the correction is also likely to miss many of the true 50. FDR methods are designed for exactly this trade-off. See the hypothesis testing guide for a full comparison of error control frameworks.

Statistical Significance vs. Effect Size

A result that survives Bonferroni correction is statistically significant — but the correction says nothing about whether the effect is practically meaningful. Always report effect sizes (Cohen's d, η², odds ratio) alongside corrected p-values. A corrected p = 0.003 with Cohen's d = 0.08 is statistically significant but trivially small. See the effect size guide for detail.