What Is a Chi-Square Test?
Chi-square tests work on count data — the number of people, items, or observations that fall into each category. They do not require the data to follow a normal distribution, which makes them one of the most broadly applicable tools in statistics. Researchers use chi-square tests in medical trials, ecology, marketing surveys, genetics, machine learning, and any other field where outcomes are measured in categories rather than continuous numbers.
The test was developed by Karl Pearson in 1900 and remains one of the most cited statistical procedures in scientific literature. According to the NIST/SEMATECH Engineering Statistics Handbook, the chi-square test is the standard approach for categorical data inference when expected cell counts are adequate. At Statistics Fundamentals, chi-square sits within the broader framework of hypothesis testing, which establishes the null and alternative hypothesis structure all these tests share.
- Formula: χ² = Σ[(O − E)² / E], where O = observed frequency, E = expected frequency
- Data requirement: Categorical variables only (counts/frequencies, not means)
- Two main types: Goodness-of-fit (1 variable) and test of independence (2 variables)
- Key assumption: Every expected cell frequency must be ≥ 5
- Most common critical value: df = 1, α = 0.05 → χ² = 3.841
- Effect size: Use Cramér's V — V = √(χ² / [n × min(r−1, c−1)])
df=1, α=0.05
cell frequency
developed it
letter chi-squared)
Two Types of Chi-Square Tests
Chi-square tests take two distinct forms. Choosing the wrong one is a common mistake, so it is worth being precise about what each one does.
| Feature | Goodness-of-Fit Test | Test of Independence |
|---|---|---|
| Number of variables | 1 categorical variable | 2 categorical variables |
| Data structure | Single frequency table | Contingency table (rows × columns) |
| Research question | Does this variable follow a specified distribution? | Are these two variables associated? |
| Null hypothesis (H₀) | Observed = expected distribution | The two variables are independent |
| Expected frequency | E = n × p (theoretical proportion) | E = (Row Total × Col Total) / N |
| Degrees of freedom | k − 1 | (r − 1)(c − 1) |
| Classic example | Do the six faces of a die appear equally often? | Is gender associated with product preference? |
Chi-Square Goodness-of-Fit Test
The goodness-of-fit test asks: does a single categorical variable match a theoretical distribution? You collect counts across k categories and compare them to the counts you would expect if a specific null hypothesis were true. A genetics researcher testing whether a cross follows Mendel's 9:3:3:1 ratio, or a quality control engineer testing whether defect types are uniformly distributed, both use this form.
Chi-Square Test of Independence
The test of independence asks: are two categorical variables related? Both variables are measured on the same sample. Their joint counts are arranged in a contingency table — rows for one variable, columns for the other — and the test determines whether the pattern of cell frequencies is consistent with the two variables being unrelated. This is the more commonly encountered form in social science, medical, and marketing research.
Chi-Square Test Formula
All chi-square tests use the same core formula for the test statistic. The distinction between test types lies in how expected frequencies are calculated, not in the formula itself.
The Chi-Square Test Statistic: χ² = Σ[(O − E)² / E]
χ² = chi-square test statistic
O = observed frequency (actual count)
E = expected frequency (under H₀)
Σ = sum over all categories or cells
In plain terms: For every cell or category, subtract the expected count from the observed count, square the result (to eliminate negative values), divide by the expected count (to standardize for scale), then add all those values together. A larger χ² means the data deviate more from the null hypothesis.
Expected Frequency Formulas
Degrees of Freedom Formulas
The degrees of freedom determine which chi-square distribution to use when finding the p-value or critical value. Higher df shifts the distribution to the right — the same χ² value corresponds to a larger p-value when df is larger. This is why the critical value at df = 1 (3.841 at α = 0.05) is much smaller than the critical value at df = 9 (16.919 at α = 0.05).
Assumptions of the Chi-Square Test
Chi-square test results are only valid when these five conditions hold. Penn State's STAT 500 course — a widely cited statistics curriculum — lists adequate expected cell frequency as the most commonly violated assumption in practice (Penn State STAT 500, Lesson 8).
Both variables must be measured as categories — nominal (unordered labels like colors or countries) or ordinal (ordered categories like rating scales). Chi-square cannot be applied to continuous measurements directly.
Each subject or observation contributes to exactly one cell. Observations must not be paired, matched, or repeated. For paired categorical data, use McNemar's test instead.
Every expected cell frequency must be at least 5. If more than 20% of cells have E < 5, the chi-square approximation is unreliable. For 2×2 tables with small expected counts, use Fisher's exact test.
Data must come from a random sample or a sampling design that is representative of the population being studied. Non-random convenience samples limit the generalizability of the result.
Each observation must fall into one and only one category. Overlapping categories (where an observation could be counted twice) violate the independence of cells and invalidate the test.
Expected frequencies below 5 account for the majority of chi-square misapplications in published research. Always check E values before reporting results. The SPSS output footnote and R's chisq.test()$expected both flag this automatically.
Chi-Square Distribution Table (Critical Values)
Use this table to find the critical value for your test. Locate your degrees of freedom in the left column, then find the column matching your significance level (α). If your calculated χ² exceeds the critical value, reject the null hypothesis.
| df | α = 0.10 | α = 0.05 | α = 0.025 | α = 0.01 | α = 0.001 |
|---|---|---|---|---|---|
| 1 | 2.706 | 3.841 | 5.024 | 6.635 | 10.828 |
| 2 | 4.605 | 5.991 | 7.378 | 9.210 | 13.816 |
| 3 | 6.251 | 7.815 | 9.348 | 11.345 | 16.266 |
| 4 | 7.779 | 9.488 | 11.143 | 13.277 | 18.467 |
| 5 | 9.236 | 11.070 | 12.833 | 15.086 | 20.515 |
| 6 | 10.645 | 12.592 | 14.449 | 16.812 | 22.458 |
| 7 | 12.017 | 14.067 | 16.013 | 18.475 | 24.322 |
| 8 | 13.362 | 15.507 | 17.535 | 20.090 | 26.125 |
| 9 | 14.684 | 16.919 | 19.023 | 21.666 | 27.877 |
| 10 | 15.987 | 18.307 | 20.483 | 23.209 | 29.588 |
| 12 | 18.549 | 21.026 | 23.337 | 26.217 | 32.910 |
| 15 | 22.307 | 24.996 | 27.488 | 30.578 | 37.697 |
| 20 | 28.412 | 31.410 | 34.170 | 37.566 | 45.315 |
| 25 | 34.382 | 37.652 | 40.646 | 44.314 | 52.620 |
| 30 | 40.256 | 43.773 | 46.979 | 50.892 | 59.703 |
| 40 | 51.805 | 55.758 | 59.342 | 63.691 | 73.402 |
| 50 | 63.167 | 67.505 | 71.420 | 76.154 | 86.661 |
| 60 | 74.397 | 79.082 | 83.298 | 88.379 | 99.607 |
| 80 | 96.578 | 101.879 | 106.629 | 112.329 | 124.839 |
| 100 | 118.498 | 124.342 | 129.561 | 135.807 | 149.449 |
Highlighted: χ² = 3.841 at df = 1, α = 0.05 — the most commonly referenced critical value. Source: Tabulated from the chi-square cumulative distribution function. Values match those in the NIST/SEMATECH Statistics Handbook Table. For the extended downloadable version, visit the Chi-Square Table reference page.
Quick Lookup — Common Scenarios
| Scenario | df | Critical value α = 0.05 | Critical value α = 0.01 |
|---|---|---|---|
| 2-category goodness-of-fit | 1 | 3.841 | 6.635 |
| 3-category goodness-of-fit | 2 | 5.991 | 9.210 |
| 4-category goodness-of-fit | 3 | 7.815 | 11.345 |
| 2×2 contingency table | 1 | 3.841 | 6.635 |
| 2×3 contingency table | 2 | 5.991 | 9.210 |
| 3×3 contingency table | 4 | 9.488 | 13.277 |
| 3×4 contingency table | 6 | 12.592 | 16.812 |
| 4×4 contingency table | 9 | 16.919 | 21.666 |
| Mendel's 9:3:3:1 ratio (4 categories) | 3 | 7.815 | 11.345 |
How to Perform a Chi-Square Test: 8-Step Method
Step 1: State H₀ and H₁. Step 2: Set α. Step 3: Build the contingency table with observed counts. Step 4: Calculate expected frequencies using E = (Row × Col) / N. Step 5: Compute χ² = Σ[(O − E)² / E]. Step 6: Find df. Step 7: Look up the critical value. Step 8: Compare χ² to the critical value and state the conclusion.
Research question: Is there a statistically significant association between gender (Male/Female) and preference (Brand A / Brand B) in a sample of 100 consumers?
State the hypotheses:
H₀: Gender and brand preference are independent (no association)
H₁: Gender and brand preference are associated
Set the significance level: α = 0.05
Build the observed contingency table:
| Brand A | Brand B | Row Total | |
|---|---|---|---|
| Male | 30 | 20 | 50 |
| Female | 10 | 40 | 50 |
| Col Total | 40 | 60 | 100 |
Calculate expected frequencies using E = (Row Total × Column Total) / Grand Total:
| Cell | Calculation | Expected (E) |
|---|---|---|
| Male / Brand A | (50 × 40) / 100 | 20.0 |
| Male / Brand B | (50 × 60) / 100 | 30.0 |
| Female / Brand A | (50 × 40) / 100 | 20.0 |
| Female / Brand B | (50 × 60) / 100 | 30.0 |
✓ All expected frequencies ≥ 5. Assumption satisfied.
Calculate the chi-square statistic:
| Cell | O | E | (O − E)² | (O − E)² / E |
|---|---|---|---|---|
| Male / Brand A | 30 | 20 | 100 | 5.000 |
| Male / Brand B | 20 | 30 | 100 | 3.333 |
| Female / Brand A | 10 | 20 | 100 | 5.000 |
| Female / Brand B | 40 | 30 | 100 | 3.333 |
| Total | χ² = 16.667 |
Degrees of freedom: df = (r − 1)(c − 1) = (2 − 1)(2 − 1) = 1
Critical value: At df = 1 and α = 0.05, critical value = 3.841 (from the distribution table above)
Decision: χ² = 16.667 > critical value 3.841 → Reject H₀
✓ Conclusion: There is a statistically significant association between gender and brand preference (χ²(1, N = 100) = 16.67, p < .001). Men and women differ in their brand preferences beyond what chance alone would predict.
Chi-Square Calculator
🧮 Chi-Square Test of Independence Calculator
Enter observed counts for a 2×2 contingency table. The calculator computes χ², degrees of freedom, the p-value approximation, and Cramér's V effect size.
Observed Counts (O)
Worked Examples Across Three Fields
Example 1 — Medical Research: Smoking and Lung Disease
This example follows the design used in classic epidemiology studies on smoking. The data below are representative of the proportions established in large-sample clinical research, consistent with findings documented by the CDC Tobacco Statistics and Data program.
Research question: Is smoking status (Smoker / Non-Smoker) associated with lung disease diagnosis (Yes / No) in a sample of 300 patients?
Observed contingency table:
| Lung Disease: Yes | Lung Disease: No | Row Total | |
|---|---|---|---|
| Smoker | 90 | 60 | 150 |
| Non-Smoker | 30 | 120 | 150 |
| Col Total | 120 | 180 | 300 |
Expected frequencies:
Smoker/Yes: (150 × 120) / 300 = 60 | Smoker/No: (150 × 180) / 300 = 90
Non-Smoker/Yes: (150 × 120) / 300 = 60 | Non-Smoker/No: (150 × 180) / 300 = 90
Chi-square statistic:
(90−60)²/60 + (60−90)²/90 + (30−60)²/60 + (120−90)²/90
= 900/60 + 900/90 + 900/60 + 900/90
= 15 + 10 + 15 + 10 = χ² = 50.00
df = (2−1)(2−1) = 1. Critical value at α = 0.05, df = 1: 3.841. Since 50.00 ≫ 3.841, reject H₀.
✓ Conclusion: Smoking status and lung disease are significantly associated (χ²(1, N = 300) = 50.00, p < .001). Cramér's V = √(50/300) = 0.408, indicating a medium-to-large effect.
Example 2 — Genetics: Mendel's Goodness-of-Fit Test
Research question: Does a dihybrid pea plant cross produce phenotype ratios consistent with Mendel's predicted 9:3:3:1 ratio in a sample of 160 offspring?
Observed vs. expected counts:
| Phenotype | Observed (O) | Ratio | Expected (E = n × p) | (O−E)²/E |
|---|---|---|---|---|
| Round/Yellow | 90 | 9/16 | 90.0 | 0.000 |
| Round/Green | 28 | 3/16 | 30.0 | 0.133 |
| Wrinkled/Yellow | 32 | 3/16 | 30.0 | 0.133 |
| Wrinkled/Green | 10 | 1/16 | 10.0 | 0.000 |
| Total | 160 | 160 | χ² = 0.267 |
df = k − 1 = 4 − 1 = 3. Critical value at α = 0.05, df = 3: 7.815. Since 0.267 ≪ 7.815, fail to reject H₀.
✓ Conclusion: The observed phenotype ratios are consistent with Mendel's 9:3:3:1 prediction (χ²(3, N = 160) = 0.27, p = .966). The data do not provide evidence against the Mendelian model.
How to Report Chi-Square Results (APA Format)
When writing up chi-square test results, follow the APA 7th edition format. This format is required by most journals and expected in graduate-level coursework. The Publication Manual of the American Psychological Association (7th ed.) specifies this structure for nonparametric test reporting.
Full sentence examples:
- For independence: "A chi-square test of independence found a significant relationship between gender and brand preference, χ²(1, N = 100) = 16.67, p < .001, Cramér's V = 0.41."
- For goodness-of-fit: "A chi-square goodness-of-fit test indicated the observed phenotype distribution was consistent with the 9:3:3:1 Mendelian ratio, χ²(3, N = 160) = 0.27, p = .97."
Statistical significance alone does not tell the reader how large the association is. With large samples, a trivially small association can produce p < .001. Always accompany a significant chi-square result with Cramér's V or Phi (φ) for 2×2 tables.
Effect Size: Cramér's V
A significant chi-square result tells you the association is real; Cramér's V tells you how strong it is. Cohen (1988) established the benchmark thresholds below, though more recent work by Lakens (2013) at Eindhoven University of Technology notes that these benchmarks should be treated as contextual guides rather than rigid rules.
| Cramér's V | Effect Size | df = 1 (2×2) | df = 2 (2×3) | df = 3 (2×4) |
|---|---|---|---|---|
| 0.10 | Small | Weak association | Weak association | Weak association |
| 0.30 | Medium | Moderate association | Moderate association | Moderate association |
| 0.50 | Large | Strong association | Strong association | Strong association |
For 2×2 tables specifically, Phi (φ) is equivalent to Cramér's V: φ = √(χ²/n). Both yield the same value when there is one degree of freedom.
Chi-Square Test in SPSS
Test of Independence in SPSS
Running a chi-square test of independence in IBM SPSS Statistics
Go to Analyze → Descriptive Statistics → Crosstabs
Move your first variable into the Row(s) box and your second variable into the Column(s) box
Click Statistics → check Chi-square → also check Phi and Cramer's V for effect size → click Continue
Click Cells → under Counts, check Observed and Expected → under Percentages, check Row → click Continue → OK
Reading SPSS output:
- In the Chi-Square Tests table, read the Pearson Chi-Square row
- The Asymptotic Significance (2-sided) column gives your p-value
- Check the footnote for "X cells have expected count less than 5" — if this appears, consider Fisher's exact test
- The Symmetric Measures table gives Cramér's V
Goodness-of-Fit in SPSS
Navigate to Analyze → Nonparametric Tests → Legacy Dialogs → Chi-Square. Move your variable into the Test Variable List. Under Expected Values, choose "All categories equal" for a uniform distribution, or enter custom expected proportions. Click OK.
Chi-Square Test in R
Test of Independence in R
RR applies Yates' continuity correction by default for 2×2 tables — this slightly reduces χ². To match hand calculations or SPSS output, disable it: chisq.test(data_matrix, correct = FALSE)
Goodness-of-Fit in R
RChi-Square vs. Other Statistical Tests
Chi-Square vs. Fisher's Exact Test
| Feature | Chi-Square Test | Fisher's Exact Test |
|---|---|---|
| Best for | Large samples (all E ≥ 5) | Small samples (any E < 5) |
| Calculation | Approximate (asymptotic) | Exact probability |
| Table size | Any r × c | Most commonly 2×2 |
| Sample size guidance | n > 40 (with all E ≥ 5) | n < 20, or any E < 5 |
| Software | Default in SPSS, R | Checkbox in SPSS; fisher.test() in R |
Chi-Square vs. t-Test
| Feature | Chi-Square Test | t-Test |
|---|---|---|
| Outcome variable type | Categorical (counts) | Continuous (means) |
| Normality required | No | Yes (or large n) |
| What it tests | Association or distribution | Difference between means |
| Example question | "Is political party related to voting behavior?" | "Is the mean exam score higher in Group A vs B?" |
| Effect size | Cramér's V | Cohen's d |
Chi-Square vs. ANOVA
| Feature | Chi-Square Test | ANOVA |
|---|---|---|
| Outcome variable | Categorical (counts) | Continuous (means) |
| Independent variable | Categorical groups | Categorical groups |
| Tests | Association between categories | Differences in group means |
| Effect size | Cramér's V | η² (eta squared) |
For a structured decision guide covering all major statistical tests, see the Statistical Test Selector tool on this site.
Where Chi-Square Tests Are Used
Medical Research
Testing whether treatment outcomes (recovered/not recovered) differ by treatment group. Used routinely in randomized controlled trials with binary endpoints.
Genetics & Biology
Verifying whether observed genotype or phenotype ratios match Mendelian or Hardy-Weinberg predictions in population genetics.
Survey Analysis
Determining whether survey responses differ by demographic groups such as age, gender, or education level. Standard in social science research.
Market Research
Testing whether brand preferences, product choices, or consumer behaviors differ across customer segments or regions.
Machine Learning
Feature selection: identifying which categorical features are statistically associated with the target variable before model training.
Social Sciences
Analyzing voting patterns, criminal justice outcomes, educational attainment, and social mobility data at the population level.
Ecology
Comparing species distribution across habitat types, or testing whether species are associated with particular environmental conditions.
Quality Control
Testing whether defect rates or product categories are uniformly distributed across production lines, batches, or suppliers.
When Chi-Square Fails: Alternatives
| Problem | Recommended Alternative | Why |
|---|---|---|
| Expected frequencies < 5 in any cell (small sample) | Fisher's exact test | Exact computation; no large-sample approximation needed |
| Paired or matched categorical data | McNemar's test | Accounts for the non-independence of matched pairs |
| Ordered categories (ordinal data) | Cochran-Armitage trend test | Detects monotonic trend rather than general association |
| Very large N (χ² inflated trivially) | Report Cramér's V; interpret practically | With huge n, even negligible associations become significant |
| Three or more repeated measures | Cochran's Q test | Extension of McNemar for k > 2 related groups |
Frequently Asked Questions
Chi-Square Quick Reference Cheat Sheet
The table below consolidates every key term, formula, and decision rule covered in this guide. It is structured for maximum LLM parsability and serves as a one-page study reference.
| Term / Entity | Formula / Value | When to Use | Plain Interpretation |
|---|---|---|---|
| Chi-square statistic (χ²) | χ² = Σ[(O − E)² / E] | All chi-square tests | Total deviation of observed from expected counts |
| Expected frequency — goodness-of-fit | E = n × p | Single-variable test | Count predicted by theoretical proportion p |
| Expected frequency — independence | E = (Row × Col) / N | Two-variable test | Count predicted if variables were unrelated |
| Degrees of freedom (goodness-of-fit) | df = k − 1 | k = number of categories | Number of free cells after constraints applied |
| Degrees of freedom (independence) | df = (r−1)(c−1) | r × c contingency table | Same logic; both row and column margins constrained |
| Critical value (df=1, α=0.05) | 3.841 | Most 2×2 tables | Exceed this → reject H₀ at 5% level |
| Critical value (df=1, α=0.01) | 6.635 | 2×2 with stricter threshold | Exceed this → reject H₀ at 1% level |
| Cramér's V | V = √(χ²/[n × min(r−1,c−1)]) | After significant χ² | Effect size: 0.10=small, 0.30=medium, 0.50=large |
| p-value | P(χ² ≥ observed | H₀ true) | Always report | p < 0.05 → reject H₀ (at conventional α) |
| Null hypothesis (independence) | H₀: Variables are independent | Test of independence | No association between the two categorical variables |
| Null hypothesis (goodness-of-fit) | H₀: Observed = expected distribution | Goodness-of-fit | Data match the theoretical distribution |
| Assumption: expected frequency | E ≥ 5 per cell | Every chi-square test | If violated, use Fisher's exact test instead |
| APA reporting format | χ²(df, N = n) = value, p = .xxx | All publications | Include df, sample size, statistic, p-value, V |
Continue Learning at Statistics Fundamentals
Related Topics in Hypothesis Testing & Statistics
Chi-square tests connect to a network of statistical concepts. The guides below cover the prerequisite ideas and follow-on methods in natural learning sequence.
- Hypothesis Testing — The framework that defines H₀, H₁, α, and the decision rule every chi-square test uses
- Hypothesis Testing Examples — Step-by-step worked examples across multiple test types
- ANOVA — The continuous-data equivalent for comparing three or more groups
- One-Sample t-Test — For continuous data when comparing a sample mean to a known value
- Two-Sample t-Test — Comparing means between two groups (continuous outcome variable)
- Chi-Square Distribution Table (Full Reference) — Extended critical values table with downloadable PDF
- Chi-Square Calculator — Online tool for computing χ², p-values, and Cramér's V
- Confidence Intervals — The interval estimation counterpart to hypothesis testing
- Statistical Test Selector — Interactive tool to choose the right test for your data
- Proportion Hypothesis Testing — For testing single proportions (z-test for proportions)
- NIST/SEMATECH Engineering Statistics Handbook — Chi-Square — Authoritative federal reference covering formula derivation and application guidelines
- Penn State STAT 500 (Lesson 8): Chi-Square Tests — University-level curriculum used in graduate-level applied statistics courses
- UCLA OARC Statistical Methods — Reference for software implementation and test selection decisions
- R Documentation: chisq.test() — Official R function documentation for chi-square test implementation
- CDC Tobacco Data and Statistics — Source context for the smoking/lung disease worked example
- OpenIntro Statistics (free PDF) — Open-source textbook with comprehensive chi-square chapters; widely cited in undergraduate coursework