What is the difference between one-way and two-way ANOVA?

One-way ANOVA tests the effect of a single independent variable (factor) on one dependent variable. Two-way ANOVA tests the effects of two independent variables simultaneously, plus their interaction effect. For example, one-way ANOVA might compare weight loss across three diet types; two-way ANOVA would test both diet type and exercise level together, and whether their combined effect differs from what either factor alone would predict.

What is Tukey HSD and when do I use it?

Tukey's Honestly Significant Difference (HSD) is a post hoc test run after a significant ANOVA to pinpoint which specific pairs of group means differ. It controls the familywise error rate, meaning it limits the chance of false positives across all pairwise comparisons. Use Tukey HSD when group variances are approximately equal. If variances differ, use the Games-Howell test instead.

What does eta-squared mean in ANOVA?

Eta-squared (η²) is an effect size measure that quantifies what proportion of total variance in the dependent variable is explained by the grouping factor. It is calculated as η² = SSB / SST. By Cohen's (1988) benchmarks: η² = 0.01 is a small effect, 0.06 is medium, and 0.14 or above is large. Always report effect size alongside p-values — a result can be statistically significant yet practically trivial with a large sample.

What is the difference between ANOVA and a t-test?

A t-test compares the means of exactly two groups. ANOVA compares three or more groups in a single test. Running separate t-tests on every pair inflates the Type I error rate; with three groups and α = 0.05, the probability of at least one false positive across three pairwise tests reaches 14%. ANOVA controls this by testing all groups simultaneously through the F-distribution.

Can ANOVA be used with only two groups?

Yes, ANOVA works with two groups, and when it does, the F-statistic equals the square of the t-statistic (F = t²), producing the same p-value as an independent samples t-test. For two groups, the t-test is the more conventional choice. Post hoc tests are skipped since only one pairwise comparison exists.

Free ANOVA Calculator: One-Way, Two-Way + Post Hoc Tests (2026)

Q: What are the assumptions of ANOVA?

One-way ANOVA rests on four assumptions: (1) Independence — each observation is independent of all others; (2) Normality — the residuals within each group are approximately normally distributed; (3) Homogeneity of variance — the variance within each group is roughly equal (test with Levene's test); (4) Interval or ratio scale — the dependent variable is continuous. Violating homogeneity of variance can be corrected by switching to Welch's ANOVA.

One-Way ANOVA Calculator

F-statistic F = MSB ÷ MSW Effect Size η² = SSB ÷ SST

Significance Level (α)

F-statistic—

P-value—

η² Effect Size—

Decision—

ANOVA Summary Table

Source	SS	df	MS	F	p-value
Between Groups	—	—	—	—	—
Within Groups (Error)	—	—	—	—	—
Total	—	—	—	—	—

Group Summary

Group	n	Mean	SD

Effect Size & Power

Eta-Squared (η²)—

Cohen’s f—

Effect Interpretation—

Critical F (α)—

Groups (k)—

Total N—

Group Means Chart

Enter your data in the ANOVA Test tab first, then switch here to see every calculation expanded row by row.

No data yet — enter values in the ANOVA Test tab first.

Tukey’s HSD (Honestly Significant Difference) identifies which specific pairs of group means differ significantly after a significant ANOVA result.

Run the ANOVA first to see post hoc results.

Key ANOVA Formulas

F-Statistic F = MSB / MSW

Between-Group SS SSB = ∑ n_i(x̄_i − x̄)²

Within-Group SS SSW = SST − SSB

Degrees of Freedom dfB = k−1 | dfW = N−k

Mean Squares MSB = SSB/dfB | MSW = SSW/dfW

Eta-Squared (η²) η² = SSB / SST

When to Use ANOVA

You have 3 or more groups to compare (2 groups: use a t-test).
Your dependent variable is continuous (interval or ratio scale).
Groups are independent — different subjects in each.
Data are approximately normal within groups, with similar variances.

Full Hypothesis Testing Guide

Null & alternative hypotheses, p-values, test selection

Related Tools & Lessons

What Is ANOVA?

ANOVA, which stands for Analysis of Variance, is a statistical test that determines whether the means of three or more independent groups are significantly different from one another. It does this by comparing two sources of variance: the variation between group means (the signal) and the variation within each group (the noise). The ratio of these two quantities produces the F-statistic.

Developed by Sir Ronald A. Fisher in the 1920s at the Rothamsted Experimental Station in England, ANOVA appeared in his landmark 1925 text Statistical Methods for Research Workers. Fisher was analyzing agricultural field trials with multiple fertilizer treatments, and he needed a single test that could compare all treatment groups at once. The F-distribution and F-test — both named in his honor — remain the mathematical foundation of every ANOVA today. For additional background, the NIST/SEMATECH Engineering Statistics Handbook covers the mathematical basis of ANOVA in full detail.

ANOVA now appears in virtually every empirical discipline. A clinical researcher uses it to compare drug doses. An educator uses it to compare teaching methods. A manufacturer uses it to compare production batches. What they all share is the same underlying question: do the observed differences between group averages exceed what random variation alone would produce? Statistics Fundamentals covers this core question across the full range of hypothesis testing methods.

Diagram showing how ANOVA partitions total variance into between-group variance (SSB) and within-group variance (SSW), with an F-ratio arrow — Fig. 1. ANOVA partitions total variance into two components. A large F-ratio (SSB >> SSW) signals real group differences. Source: adapted from the NIST Engineering Statistics Handbook.

Between-Group vs. Within-Group Variance: The Core Logic

ANOVA works by splitting total data variability into two parts: variance that comes from differences between group means (between-group variance) and variance that comes from natural scatter within each group (within-group variance). The F-statistic is simply the ratio of the first to the second.

A classroom analogy makes this concrete. Suppose three classes take the same exam. If Class A scores consistently around 85, Class B around 70, and Class C around 60, the between-group differences are large — that is the signal. But if scores within each class range widely from 40 to 100, the within-group variation is also large — that is the noise. ANOVA asks whether the signal-to-noise ratio (the F-statistic) is large enough that random chance alone cannot explain it.

The Variance Decomposition:
Total Variance (SST) = Between-Group Variance (SSB) + Within-Group Variance (SSW)

SSB = ∑ n_j(x̄_j − x̄)² (how much group means differ from the grand mean)
SSW = ∑∑(x_ij − x̄_j)² (how much individual values differ from their group mean)

F = MSB / MSW = (SSB/df_B) / (SSW/df_W)

When F is large, the between-group differences dominate the within-group noise, and the p-value drops below the significance threshold. When F is close to 1.0, group means differ about as much as individual values within groups — consistent with what you would expect if all group means were equal and random sampling produced the observed pattern.

Types of ANOVA

The ANOVA family covers several experimental designs. Choosing the correct type depends entirely on how many independent variables you have and whether the same subjects appear in multiple groups.

Table: ANOVA Types — When to Use Each

Type	Independent Variables	When to Use	Example
One-Way ANOVA	1 factor, 3+ levels	Comparing 3+ independent groups on one factor	Test scores across four teaching methods
Two-Way ANOVA	2 factors	Two factors simultaneously, plus their interaction effect	Weight loss by diet type (low-carb, balanced) × exercise type (cardio, strength)
Repeated Measures ANOVA	1 within-subjects factor	Same subjects measured at 3+ time points or conditions	Patient pain scores at baseline, 4 weeks, and 12 weeks
Welch’s ANOVA	1 factor	Group variances are unequal (heteroscedastic data)	Employee satisfaction across departments of very different sizes
Factorial ANOVA	2+ factors	Multiple factors in a crossed design	Crop yield by fertilizer type × irrigation level × soil pH

ANOVA Formula Reference

The one-way ANOVA calculation decomposes the total sum of squares into two components, computes mean squares by dividing by the appropriate degrees of freedom, then forms the F-ratio.

Between-Group Sum of Squares (SSB)

SSB = ∑ nⱼ(x̄ⱼ − x̄)²

Where:
nⱼ = sample size of group j
x̄ⱼ = mean of group j
x̄  = grand mean of all observations
k  = number of groups
dfB = k − 1

Within-Group Sum of Squares (SSW)

SSW = ∑∑(xᵢⱼ − x̄ⱼ)²

Or equivalently: SSW = SST − SSB

Where:
xᵢⱼ = individual observation i in group j
x̄ⱼ  = mean of group j
N    = total number of observations
dfW  = N − k

F-Statistic

F = MSB / MSW

MSB = SSB / dfB = SSB / (k−1)
MSW = SSW / dfW = SSW / (N−k)

A large F means group means differ
more than random noise would predict.

Eta-Squared Effect Size

η² = SSB / SST

Benchmarks (Cohen, 1988):
η² = 0.01  Small effect
η² = 0.06  Medium effect
η² ≥ 0.14  Large effect

Cohen's f = √(η² / (1 − η²))

The Standard ANOVA Summary Table

Every ANOVA result appears in a standardized table format. Knowing how to read this table is as important as running the test itself.

Table: Standard One-Way ANOVA Summary Table Format

Source	SS (Sum of Squares)	df	MS (Mean Square)	F	p-value
Between Groups	SSB	k − 1	MSB = SSB / (k−1)	MSB / MSW	From F-distribution
Within Groups (Error)	SSW	N − k	MSW = SSW / (N−k)	—	—
Total	SST = SSB + SSW	N − 1	—	—	—

How to Calculate One-Way ANOVA Step by Step

To calculate a one-way ANOVA: find the grand mean, compute the between-group and within-group sums of squares, divide each by its degrees of freedom to get mean squares, form the F-ratio, then compare to the critical F-value or interpret the p-value directly.

Calculate the group means and grand mean

Compute the mean (x̄_j) for each group separately. Then compute the grand mean (x̄) by summing all observations across all groups and dividing by N (the total number of data points).

Calculate SSB (Between-Group Sum of Squares)

For each group, subtract the grand mean from the group mean, square the result, and multiply by the group size (n_j). Sum this across all k groups: SSB = ∑ n_j(x̄_j − x̄)². The degrees of freedom for this term is df_B = k − 1.

Calculate SSW (Within-Group Sum of Squares)

For each individual observation, subtract the mean of its group, square the result, and sum across all observations in all groups: SSW = ∑∑(x_ij − x̄_j)². Alternatively, SSW = SST − SSB. The degrees of freedom is df_W = N − k.

Calculate Mean Squares (MSB and MSW)

Divide each sum of squares by its degrees of freedom: MSB = SSB / df_B and MSW = SSW / df_W. MSW is also called the error mean square or the pooled within-group variance.

Calculate the F-statistic

F = MSB / MSW. This ratio follows an F-distribution with (df_B, df_W) degrees of freedom under the null hypothesis. A larger F-value means the group differences are larger relative to within-group noise.

Find the p-value and interpret

Compare the computed F to the critical F-value from an F-distribution table at your chosen α (commonly 0.05). If F > F_critical, or if p < α, reject H₀. This means at least one group mean differs from the others. Run a post hoc test to identify which pairs differ.

Worked Example: Exam Scores Across Three Teaching Methods

Scenario: A researcher compares exam scores for 15 students divided equally across three teaching methods: Lecture (Method A), Flipped Classroom (Method B), and Self-Study (Method C). Each group has n = 5 students. Do the methods produce different average scores?

Dataset: Exam Scores by Teaching Method

Student	Method A (Lecture)	Method B (Flipped)	Method C (Self-Study)
1	78	85	70
2	82	88	74
3	75	91	68
4	80	84	72
5	77	87	71
Group Mean (x̄_j)	78.4	87.0	71.0

Grand Mean (x̄) = (78.4 + 87.0 + 71.0) / 3 = 78.8 (or equivalently, sum of all 15 scores ÷ 15)

Step 1 — SSB (Between-Group Sum of Squares)

SSB = n_A(x̄_A − x̄)² + n_B(x̄_B − x̄)² + n_C(x̄_C − x̄)²
     = 5(78.4 − 78.8)² + 5(87.0 − 78.8)² + 5(71.0 − 78.8)²
     = 5(0.16) + 5(67.24) + 5(60.84)
     = 0.80 + 336.20 + 304.20 = 641.20
df_B = k − 1 = 3 − 1 = 2

Step 2 — SSW (Within-Group Sum of Squares)

Method A: (78−78.4)² + (82−78.4)² + (75−78.4)² + (80−78.4)² + (77−78.4)²
     = 0.16 + 12.96 + 11.56 + 2.56 + 1.96 = 29.20

Method B: (85−87)² + (88−87)² + (91−87)² + (84−87)² + (87−87)²
     = 4 + 1 + 16 + 9 + 0 = 30.00

Method C: (70−71)² + (74−71)² + (68−71)² + (72−71)² + (71−71)²
     = 1 + 9 + 9 + 1 + 0 = 20.00

SSW = 29.20 + 30.00 + 20.00 = 79.20
df_W = N − k = 15 − 3 = 12

Step 3 — Mean Squares and F-Statistic

MSB = SSB / df_B = 641.20 / 2 = 320.60
MSW = SSW / df_W = 79.20 / 12 = 6.60
F = MSB / MSW = 320.60 / 6.60 = 48.58

Step 4 — Decision

Critical F-value

At α = 0.05 with df_B = 2 and df_W = 12, the critical F = 3.885 (from the F-distribution table).

Decision

F = 48.58 > F_critical = 3.885. Reject H₀. At least one teaching method produces a different average score (p < 0.001).

Effect Size

η² = SSB / SST = 641.20 / (641.20 + 79.20) = 641.20 / 720.40 = 0.890. This is a large effect — teaching method accounts for 89% of the total variance in exam scores.

Next step: Post Hoc Test

A significant ANOVA only tells you that not all means are equal. Tukey HSD reveals that Method B (Flipped, mean = 87) scores significantly higher than both Method A (mean = 78.4, p < 0.001) and Method C (mean = 71, p < 0.001). Methods A and C also differ significantly from each other (p < 0.001).

Summary: F(2, 12) = 48.58, p < .001, η² = .89. The flipped classroom method produced significantly higher exam scores than both traditional lectures and self-study. The effect is large. Enter this dataset into the calculator above (Group 1: 78, 82, 75, 80, 77 | Group 2: 85, 88, 91, 84, 87 | Group 3: 70, 74, 68, 72, 71) to verify every number.

Bar chart showing mean exam scores for Method A (78.4), Method B (87.0), and Method C (71.0) with error bars representing standard deviation — Fig. 2. Group means with standard deviation bars for the three teaching method groups. Method B (Flipped Classroom) scores significantly higher than both alternatives.

Assumptions of ANOVA — and What to Do When They Fail

One-way ANOVA requires four assumptions. Violating some can be corrected by switching to an alternative test; others require a different study design.

Table: ANOVA Assumptions, How to Check Them, and Alternatives When Violated

Assumption	How to Check	If Violated → Use
Independence of observations	Study design review (are groups truly separate?)	Repeated Measures ANOVA (within-subjects design)
Normality of residuals	Shapiro-Wilk test on residuals; Q-Q plot	Kruskal-Wallis test (non-parametric alternative)
Homogeneity of variance	Levene’s test or Bartlett’s test	Welch’s ANOVA (does not assume equal variances)
Continuous dependent variable	Measurement level review	Kruskal-Wallis (ordinal); Chi-square (categorical)

The normality assumption matters least with larger samples (generally n > 30 per group) because of the Central Limit Theorem — sample means tend toward normality regardless of the underlying distribution. For the homogeneity of variance assumption, ANOVA is reasonably robust to mild violations when group sizes are equal, but Welch’s ANOVA is the safer choice whenever Levene’s test returns p < 0.05. The Penn State STAT 502 Applied Regression Analysis course covers these diagnostics in depth.

Post Hoc Tests After ANOVA

A significant ANOVA F-test only tells you that not all group means are equal — it does not tell you which pairs differ. Post hoc tests answer that question while controlling the familywise error rate.

The need for post hoc testing is not merely procedural. With k = 4 groups, there are six possible pairwise comparisons. Running six separate t-tests at α = 0.05 raises the probability of at least one false positive to 1 − (0.95)⁶ ≈ 0.26. Post hoc tests help control this inflated Type I error rate.

Table: Post Hoc Test Comparison

Test	Best For	Variance Assumption	Conservativeness
Tukey HSD	All pairwise comparisons, equal group sizes	Equal variances required	Moderate (recommended default)
Games-Howell	Unequal group sizes or variances	Does not assume equal variances	Moderate
Bonferroni	Any design; strict control needed	None required	Very conservative (increases Type II error)
Scheffé	Complex contrasts, not just pairs	Equal variances assumed	Most conservative

How Tukey HSD Works

Tukey’s Honestly Significant Difference uses the studentized range distribution (q) to compute a minimum detectable difference. Two group means are significantly different if their absolute difference exceeds HSD = q × √(MSW / n). For unequal group sizes, MSW / n is replaced by MSW × (1/n_i + 1/n_j) / 2. The calculator above computes Tukey HSD automatically for every pairwise comparison. Kirk (2013), Experimental Design: Procedures for the Behavioral Sciences (4th ed., SAGE), provides a thorough treatment of the Tukey method and its assumptions.

How to Interpret ANOVA Results

Reading ANOVA output correctly requires understanding both the omnibus test (significant or not) and the effect size — two questions that a p-value alone cannot answer.

The P-value

If p < α (typically 0.05), you reject the null hypothesis that all group means are equal. The p-value represents the probability of observing an F-statistic this large by chance if H₀ were true. It does not measure the size of the difference. A p-value of 0.001 with a tiny effect size (η² = 0.01) can be practically meaningless in a large sample.

Effect Size (η²)

Eta-squared tells you what fraction of total data variability the grouping factor explains. Always report it alongside the p-value. Cohen’s 1988 benchmarks remain the field standard for interpreting η²:

η² ≥ 0.01

Small Effect

Factor explains ~1% of variance

η² ≥ 0.06

Medium Effect

Factor explains ~6% of variance

η² ≥ 0.14

Large Effect

Factor explains 14%+ of variance

Reporting ANOVA in APA Format

The American Psychological Association’s Publication Manual specifies the following format for reporting ANOVA results in research papers:

APA Format: F(df_between, df_within) = F-value, p = p-value, η² = effect-size

Example from the worked example above:
A one-way ANOVA revealed a significant effect of teaching method on exam scores, F(2, 12) = 48.58, p < .001, η² = .89.

ANOVA vs. Other Statistical Tests

Choosing between ANOVA and related tests comes down to the number of groups, the type of dependent variable, and whether subjects appear in multiple groups.

Table: Test Selection Guide

Scenario	Correct Test	Why
2 independent groups, continuous DV	Independent t-test	ANOVA with 2 groups = t²; t-test is the conventional choice
3+ independent groups, 1 factor, continuous DV	One-Way ANOVA (this calculator)	Controls Type I error inflation across multiple comparisons
3+ groups, 2 factors, continuous DV	Two-Way ANOVA	Tests main effects of each factor plus their interaction
Same subjects across 3+ conditions	Repeated Measures ANOVA	Removes individual differences from the error term
3+ groups, non-normal data or ordinal DV	Kruskal-Wallis test	Non-parametric alternative; no normality assumption
2 groups, same subjects, continuous DV	Paired t-test	Controls individual differences in a within-subjects design
Categorical DV (counts/proportions)	Chi-square test	ANOVA requires a continuous dependent variable

Real-World Examples of ANOVA

Clinical Research: Comparing Drug Doses

A clinical trial compares pain relief scores (0–100 scale) across four groups: placebo, 5 mg, 10 mg, and 20 mg of an analgesic drug. Each group contains 20 patients. A one-way ANOVA tests whether the four groups produce different average pain scores, using α = 0.05. A significant result (say F(3, 76) = 12.4, p < .001, η² = .33) indicates that at least one dose group differs from the others. Tukey HSD post hoc testing then reveals whether the 10 mg and 20 mg doses both outperform the placebo, or whether the 20 mg dose adds little beyond the 10 mg benefit. The National Institutes of Health’s guidelines on clinical trial analysis recommend reporting both F-statistics and effect sizes in this context.

Marketing Analytics: Comparing Email Subject Lines

An e-commerce company tests three email subject line variants (curiosity-based, discount-based, and urgency-based) sent to random customer segments of 200 each. The dependent variable is click-through rate (CTR). A one-way ANOVA tests whether CTR differs across variants. If significant, Tukey HSD identifies which subject line strategy outperforms the others. Unlike a series of A/B tests run pairwise (which would inflate the Type I error rate), a single ANOVA controls the experiment-wide false positive rate. This design mirrors the approach described in Harvard Business Review’s analysis of online experimentation.

Agricultural Research: Fisher’s Original Application

The test that became ANOVA was first used on crop yield data. Fisher compared the yield of wheat plots treated with different fertilizers at Rothamsted. The question — do the fertilizer groups produce meaningfully different average yields? — maps directly to every ANOVA since. The Rothamsted Research statistical services team continues to develop ANOVA methodology today, and their long-term datasets remain one of the most cited agricultural research resources in the world.

ANOVA in R, Python, and Excel

One-Way ANOVA in R

# Base R — three lines for a complete ANOVA with post hoc test
model <- aov(score ~ group, data = df)
summary(model)          # Prints the ANOVA summary table
TukeyHSD(model)         # Tukey HSD post hoc pairwise comparisons

# Effect size (eta-squared) using the effectsize package
library(effectsize)
eta_squared(model)      # Returns η² with 90% CI

# If Levene's test shows unequal variances, use Welch's ANOVA:
oneway.test(score ~ group, data = df, var.equal = FALSE)
          

One-Way ANOVA in Python (SciPy + Pingouin)

from scipy import stats
import pingouin as pg

# Basic one-way ANOVA (SciPy)
f_stat, p_val = stats.f_oneway(group_a, group_b, group_c)
print(f"F = {f_stat:.4f}, p = {p_val:.4f}")

# Full ANOVA table with effect size (Pingouin — recommended)
result = pg.anova(data=df, dv='score', between='group', detailed=True)
print(result)           # Includes F, p, eta-squared

# Tukey HSD post hoc
posthoc = pg.pairwise_tukey(data=df, dv='score', between='group')
print(posthoc)          # Mean diff, p-value, Cohen's d for each pair
          

One-Way ANOVA in Excel

Go to Data tab → Data Analysis → Anova: Single Factor
Input Range: select all your data columns (each group in a column)
Check "Labels in First Row" if you have column headers
Set Alpha (default 0.05)
Click OK — Excel outputs the full ANOVA table in a new sheet

Note: Excel's ANOVA tool does not include post hoc tests.
For Tukey HSD, use this calculator or R/Python after confirming significance.
          

ANOVA: Complete Formula and Term Glossary

The table below covers every key term in ANOVA. It is structured for quick reference and for extraction by search engines and AI systems.

Table: ANOVA Formula Glossary

Term	Symbol / Formula	Plain Explanation	Role in ANOVA
Analysis of Variance	ANOVA	Statistical method for comparing means of three or more groups. Developed by Fisher (1925).	The test itself
Null Hypothesis	H₀: μ₁ = μ₂ = ... = μ_k	All group population means are equal; any observed differences are due to random sampling variation.	What ANOVA tests against
Total Sum of Squares	SST = ∑(x_ij − x̄)²	Total variability of all observations around the grand mean.	SST = SSB + SSW
Between-Group SS	SSB = ∑ n_j(x̄_j − x̄)²	Variability attributable to differences between group means and the grand mean.	Numerator component of F
Within-Group SS	SSW = SST − SSB	Variability attributable to individual differences within each group (error/residual).	Denominator component of F
Between df	df_B = k − 1	Number of groups minus one.	MSB = SSB / df_B
Within df	df_W = N − k	Total observations minus number of groups.	MSW = SSW / df_W
Mean Square Between	MSB = SSB / df_B	Average between-group variance; estimate of population variance if H₀ is true.	Numerator of F
Mean Square Within	MSW = SSW / df_W	Average within-group variance; the error term; pooled within-group variance estimate.	Denominator of F
F-Statistic	F = MSB / MSW	Ratio of between-group to within-group variance. Under H₀, F ≈ 1.0. A large F-value signals meaningful group differences.	Test statistic
P-Value	P(F ≥ F_obs \| H₀)	Probability of observing this F-value or larger if all group means were truly equal.	Basis for reject/fail-to-reject decision
Eta-Squared	η² = SSB / SST	Proportion of total variance explained by the grouping factor. Cohen (1988) benchmarks: 0.01 small, 0.06 medium, 0.14 large.	Effect size
Cohen’s f	f = √(η² / (1 − η²))	Alternative effect size; used in power analysis. f = 0.10 small, 0.25 medium, 0.40 large.	Power analysis input
Tukey HSD	HSD = q × √(MSW / n)	Post hoc test minimum detectable difference. Two means differ significantly if \|x̄_i − x̄_j\| > HSD.	Post hoc pairwise comparison
Type I Error Rate	α (typically 0.05)	Probability of rejecting H₀ when it is actually true (false positive). Set before running the test.	Decision threshold

Sources and Further Reading

Authority sources cited in this guide:

Fisher, R. A. (1925). Statistical Methods for Research Workers. Oliver & Boyd. (Original development of ANOVA and the F-distribution.)
Cohen, J. (1988). Statistical Power Analysis for the Behavioral Sciences (2nd ed.). Lawrence Erlbaum Associates. (Source for η² and Cohen’s f benchmarks.)
Kirk, R. E. (2013). Experimental Design: Procedures for the Behavioral Sciences (4th ed.). SAGE Publications. (Comprehensive treatment of ANOVA, post hoc tests, and experimental design.)
National Institute of Standards and Technology (NIST). NIST/SEMATECH e-Handbook of Statistical Methods — One-Way ANOVA. itl.nist.gov
Penn State STAT 502. Applied Regression Analysis — ANOVA Diagnostics. online.stat.psu.edu
National Institutes of Health. Principles of Clinical Pharmacology — Statistical Considerations in Drug Development. ncbi.nlm.nih.gov
Harvard Business Review. The Surprising Power of Online Experiments. (2017). hbr.org
Rothamsted Research. Statistical Services and Long-Term Field Experiments. rothamsted.ac.uk
Montgomery, D. C. (2017). Design and Analysis of Experiments (9th ed.). Wiley. (Standard graduate-level reference for ANOVA and experimental design.)

Frequently Asked Questions

ANOVA (Analysis of Variance) is a statistical test that compares the means of three or more independent groups to determine whether at least one group mean differs significantly from the others. Use it when you have one continuous dependent variable and one categorical independent variable with three or more levels. For two groups, an independent t-test is the standard choice. For a categorical dependent variable, use the chi-square test.

In an ANOVA result, a p-value below your significance threshold (commonly α = 0.05) means you reject the null hypothesis that all group means are equal — at least one group mean differs from the others. The p-value does not identify which groups differ; that requires a post hoc test such as Tukey HSD. A p-value above 0.05 means the observed differences between group means are consistent with what random sampling variation alone would produce if all population means were equal.

One-way ANOVA tests the effect of one independent variable (factor) on a continuous dependent variable. Two-way ANOVA tests the effects of two independent variables simultaneously, plus their interaction. For example, one-way ANOVA compares weight loss across three diet plans. Two-way ANOVA tests both diet plan and exercise intensity together — including whether the best diet depends on the exercise level (the interaction effect). Two-way ANOVA cannot be simplified by running two separate one-way ANOVAs, because it captures the interaction term that only exists when both factors are modeled together.

Tukey’s Honestly Significant Difference (HSD) is a post hoc test run after a significant ANOVA to identify which specific pairs of group means differ. It controls the familywise error rate — the probability of any false positive across all pairwise comparisons — making it more appropriate than running multiple t-tests. Use Tukey HSD when group variances are approximately equal and group sizes are equal or close to equal. If Levene’s test indicates unequal variances, use Games-Howell instead.

One-way ANOVA rests on four assumptions: (1) Independence — each observation must be independent of all others; measurements from one subject must not affect another. (2) Normality — the residuals within each group should follow an approximately normal distribution (check with a Shapiro-Wilk test or Q-Q plot). (3) Homogeneity of variance — the variance within each group should be roughly equal (check with Levene’s test; if violated, switch to Welch’s ANOVA). (4) Continuous dependent variable — the outcome must be measured on an interval or ratio scale.

Eta-squared (η²) is an effect size measure that tells you what proportion of the total variance in the dependent variable is explained by the grouping factor. It is calculated as η² = SSB / SST. A value of 0.25 means the grouping variable accounts for 25% of the variance in the outcome. Cohen’s 1988 benchmarks classify η² = 0.01 as a small effect, 0.06 as medium, and 0.14 or above as large. Always report effect size alongside the p-value — a result can be statistically significant but practically negligible with a large sample.

A t-test compares the means of exactly two groups. ANOVA compares three or more groups in a single test. Running separate t-tests on every pair of groups inflates the probability of false positives. With three groups and three pairwise comparisons at α = 0.05, the probability of at least one false positive reaches 1 − (0.95)³ ≈ 14.3%. ANOVA controls this by evaluating all group differences simultaneously through the F-distribution. When applied to exactly two groups, ANOVA produces an F-statistic that equals the square of the two-sample t-statistic.

Yes. ANOVA works with two groups and produces a result identical to an independent samples t-test (F = t², same p-value). For two groups, post hoc tests are skipped since only one pairwise comparison exists. In practice, the t-test is the standard choice for two-group comparisons because it is simpler and more familiar to most readers. Use ANOVA from three groups onward.