What is ANOVA in simple terms?

ANOVA (Analysis of Variance) is a statistical test that compares the means of three or more groups to determine whether at least one group mean differs significantly from the others. It does this by analyzing the ratio of between-group variance (differences among group averages) to within-group variance (natural scatter inside each group), producing an F-statistic.

What is the difference between one-way and two-way ANOVA?

One-way ANOVA tests the effect of a single independent variable on one dependent variable (e.g., three fertilizer types on crop yield). Two-way ANOVA tests two independent variables simultaneously, and also detects interaction effects — whether the effect of one variable depends on the level of the other.

What are the assumptions of ANOVA?

ANOVA requires: (1) independence of observations, (2) approximate normality within each group, (3) homogeneity of variance across groups (tested with Levene's test), and (4) random sampling from the population.

What is the difference between ANOVA and a t-test?

A t-test compares the means of exactly two groups. ANOVA compares three or more groups simultaneously. Running multiple t-tests instead of one ANOVA inflates the Type I error rate — ANOVA controls this by testing all groups in a single procedure.

What is eta squared (η²) in ANOVA?

Eta squared (η²) is the effect size for ANOVA. It equals SSB ÷ SST and represents the proportion of total variance explained by the group factor. Benchmarks: η² = 0.01 (small), 0.06 (medium), 0.14 (large).

ANOVA Explained: Complete Beginner's Guide to Analysis of Variance

Q: What is the non-parametric alternative to ANOVA?

The Kruskal-Wallis test is the non-parametric equivalent of one-way ANOVA. Use it when the normality assumption is violated, especially with small samples or ordinal data.

What Is ANOVA? (Plain-English Definition)

Definition — Analysis of Variance

ANOVA (Analysis of Variance) is a statistical test used to compare the means of three or more groups. It works by analyzing the ratio of variance between groups — differences caused by the factor being studied — to variance within groups — natural scatter inside each group. A large F-statistic means group differences exceed what random chance would produce, so we reject the null hypothesis that all group means are equal.

F = Mean Square Between ÷ Mean Square Within

At its heart, ANOVA answers one question: are these groups genuinely different, or does the variation we see fit what random chance alone would produce? The name "Analysis of Variance" can mislead newcomers who expect a test about variances. ANOVA actually tests means — but it does so by looking at how variance is distributed across and within groups.

The Statistics Fundamentals team at statisticsfundamentals.com covers ANOVA as part of the broader hypothesis testing curriculum — because ANOVA sits squarely inside the inferential statistics toolkit alongside the one-sample t-test, the two-sample t-test, and regression analysis.

⚡ Quick Reference — ANOVA Key Facts

Full name: Analysis of Variance — a test that compares 3+ group means using variance ratios
Test statistic: F = MSB ÷ MSW (between-group mean square divided by within-group mean square)
Decision rule: If p < α (typically 0.05), reject H₀ and conclude that groups differ
What it tells you: Whether any group differs — not which specific groups (you need post-hoc tests for that)
When to use ANOVA vs t-test: Three or more groups → ANOVA; exactly two groups → t-test
Non-parametric equivalent: Kruskal-Wallis test (when normality assumption fails)

Why ANOVA Is Used in Statistics

Here is the problem ANOVA solves. Suppose you want to compare exam scores from four different tutoring programs. The obvious approach is to run pairwise t-tests: program A vs. B, A vs. C, A vs. D, B vs. C, B vs. D, C vs. D. That is six separate tests.

Each t-test carries a 5% chance of a false positive at α = 0.05. Six independent tests means the combined probability of making at least one false positive climbs to roughly 1 − (0.95)⁶ ≈ 26%. You have inflated your experiment-wise Type I error rate from 5% to 26% without testing anything new. ANOVA runs all the group comparisons in one procedure, keeping the error rate at exactly α = 0.05.

ℹ️

The Core ANOVA Logic

ANOVA asks: is the variation between group averages large relative to the variation inside groups? If groups genuinely differ, their means should spread out far more than individual scores vary within any one group. That ratio — between-group signal divided by within-group noise — is the F-statistic.

This framework was developed by Ronald A. Fisher in the 1920s, formalized in his 1925 textbook Statistical Methods for Research Workers. Fisher's insight — that variance can be partitioned into explainable and unexplainable components — remains one of the foundational ideas in experimental statistics. The NIST/SEMATECH e-Handbook of Statistical Methods gives the historical derivation if you want the full theoretical context.

Types of ANOVA

ANOVA is not a single test. Three versions cover most research scenarios, and the choice depends on how many independent variables you have and whether the same subjects appear in multiple conditions.

One-Way ANOVA

One-way ANOVA tests the effect of a single independent variable (the "factor") that has three or more levels (the "groups"). The dependent variable must be continuous. Example: you grow wheat under three different fertilizer conditions and measure yield per plot. One independent variable (fertilizer type), one dependent variable (yield), three groups.

This is the most common form of ANOVA, and the one we work through completely in the step-by-step example below.

Two-Way ANOVA

Two-way ANOVA introduces a second independent variable. Now you can test three things at once: the main effect of factor A, the main effect of factor B, and their interaction — whether the effect of A depends on the level of B.

A concrete example: you measure student test scores grouped by both teaching method (lecture, flipped classroom, online) and class size (small, medium, large). Two-way ANOVA tells you whether each factor independently affects scores, and whether the best teaching method varies by class size. Without the interaction term, two separate one-way ANOVAs would miss that dependency entirely.

Repeated Measures ANOVA

Repeated measures ANOVA is for situations where the same subjects appear in every condition. Think of measuring blood pressure in the same patients at baseline, after four weeks of treatment, and after eight weeks. Because each person's scores are correlated, regular ANOVA would underestimate effect sizes. Repeated measures ANOVA accounts for within-subject correlation, giving a more powerful test.

Type	Independent Variables	Subjects
One-Way ANOVA	1 factor, 3+ levels	Different groups
Two-Way ANOVA	2 factors, tests interaction	Different groups
Repeated Measures ANOVA	1+ factors, same subjects	Same subjects in all conditions
MANOVA	1+ factors	Different groups, multiple DVs

The ANOVA Formula Explained

The ANOVA formula produces one number — the F-statistic — by working through four quantities: two sums of squares, two mean squares. Each step is a ratio, and each ratio has an intuitive meaning.

ANOVA Core Formula — F-Statistic

F = MSB / MSW

Mean Square Between divided by Mean Square Within

MSB = SSB ÷ (k − 1) MSW = SSW ÷ (N − k) SSB = Sum of Squares Between groups SSW = Sum of Squares Within groups k = number of groups N = total number of observations

Breaking Down Each Component

The table below maps each ANOVA term to what it measures. Read this before looking at any formula — the labels matter more than the arithmetic at first.

Term	Symbol	What It Measures	Formula
Sum of Squares Between	SSB	How far group means spread from the grand mean	Σ nᵢ × (x̄ᵢ − x̄)²
Sum of Squares Within	SSW	How far individual scores spread from their own group mean	Σ Σ (xᵢⱼ − x̄ᵢ)²
Sum of Squares Total	SST	Total variation across all observations	SSB + SSW
Degrees of Freedom Between	df_B	Number of groups minus one	k − 1
Degrees of Freedom Within	df_W	Total observations minus number of groups	N − k
Mean Square Between	MSB	Average between-group variance per degree of freedom	SSB ÷ df_B
Mean Square Within	MSW	Average within-group variance per degree of freedom	SSW ÷ df_W
F-statistic	F	Ratio of between-group to within-group variance	MSB ÷ MSW

Understanding the F-Statistic

The F-statistic is a signal-to-noise ratio. The signal is between-group variance — how much the group averages differ from each other. The noise is within-group variance — how much individual scores vary inside each group regardless of the group assignment.

When F = 1, the signal equals the noise: group differences are no larger than would be expected from random variation. When F is large, the group differences are substantial relative to the background scatter, and we start suspecting something real is happening. The F-table gives the critical value that F must exceed for a given α and degrees of freedom.

F-Distribution Shape — Right-Skewed, Always Positive

The F-distribution is always positive and right-skewed. Values in the red shaded region (beyond F critical) lead to rejecting H₀.

Reading an F-Table vs. Using a P-Value

Two approaches exist for making the ANOVA decision. The classical approach compares your computed F to the critical F-value from a table, indexed by df_between, df_within, and α. The modern approach — used in every statistics package — computes the exact p-value, which is the probability of observing this F-ratio (or larger) if H₀ were true. Both give the same conclusion when used correctly.

Null and Alternative Hypotheses in ANOVA

ANOVA is a formal hypothesis test, so it begins with two competing claims about the population.

ANOVA Hypotheses

H₀: μ₁ = μ₂ = μ₃ = ... = μₖ

Null hypothesis: all group population means are equal

H₁: at least one μᵢ differs from the rest

Alternative hypothesis: at least one group mean is different

One common misreading of H₁ is thinking it means "all groups differ." It does not. H₁ says at least one group mean differs from the others. That one difference is enough to reject H₀. This is why a significant ANOVA result needs post-hoc testing — the overall test only tells you that something is different, not where the difference lies.

Between-Group vs Within-Group Variance

Understanding these two components is the key to understanding why ANOVA works.

Between-Group Variance (The Signal)

Between-group variance measures how much the group means vary around the grand mean (the average of all observations combined). If the treatment or factor has a real effect, the groups will have meaningfully different averages and between-group variance will be large.

Within-Group Variance (The Noise)

Within-group variance measures how much individual observations vary within each group, regardless of group membership. This is random variation — individual differences, measurement error, anything that is not caused by the factor we are studying. It represents the noise floor.

✅

The Core ANOVA Decision Rule in Plain English

If group means spread far apart relative to within-group scatter → F is large → p is small → reject H₀. If group means cluster together and within-group scatter is comparable → F is near 1 → p is large → fail to reject H₀.

Image Placeholder

Add a diagram here showing three overlapping bell curves (between-group spread) vs three tight bell curves (within-group variation). Suggested filename: between-within-variance-anova.png

Step-by-Step ANOVA Example (Full Calculation)

Numbers are easier to follow than formulas in isolation. Here is a complete one-way ANOVA worked from raw data to a decision, showing every calculation.

The Research Question

A soil scientist tests three fertilizer formulations (A, B, C) to see whether any produces significantly higher wheat yield. Each formulation gets applied to five separate plots, and yield (in bushels per acre) is recorded after harvest.

The Dataset

Plot	Fertilizer A	Fertilizer B	Fertilizer C
1	20	28	18
2	22	30	20
3	19	27	17
4	21	29	19
5	23	31	21
Group Mean	21.0	29.0	19.0

One-Way ANOVA — Complete Calculation

Three fertilizer groups (k = 3), five observations per group (n = 5), total N = 15. α = 0.05.

State the hypotheses. H₀: μ_A = μ_B = μ_C (all fertilizers produce equal yield). H₁: at least one fertilizer mean differs.

Calculate the grand mean. Grand mean x̄ = (sum of all 15 observations) ÷ 15 = (105 + 145 + 95) ÷ 15 = 345 ÷ 15 = 23.0

Calculate SSB (Sum of Squares Between).
SSB = n_A×(x̄_A − x̄)² + n_B×(x̄_B − x̄)² + n_C×(x̄_C − x̄)²
SSB = 5×(21 − 23)² + 5×(29 − 23)² + 5×(19 − 23)²
SSB = 5×4 + 5×36 + 5×16 = 20 + 180 + 80 = 280

Calculate SSW (Sum of Squares Within).
For group A: (20−21)² + (22−21)² + (19−21)² + (21−21)² + (23−21)² = 1+1+4+0+4 = 10
For group B: (28−29)² + (30−29)² + (27−29)² + (29−29)² + (31−29)² = 1+1+4+0+4 = 10
For group C: (18−19)² + (20−19)² + (17−19)² + (19−19)² + (21−19)² = 1+1+4+0+4 = 10
SSW = 10 + 10 + 10 = 30

Compute degrees of freedom.
df_between = k − 1 = 3 − 1 = 2
df_within = N − k = 15 − 3 = 12

Compute MSB and MSW, then F.
MSB = 280 ÷ 2 = 140
MSW = 30 ÷ 12 = 2.5
F = 140 ÷ 2.5 = 56.0

Compare to the critical value. For F(2, 12) at α = 0.05, the critical value from the F-table is approximately 3.89. Our F = 56.0 exceeds this by a wide margin.

✓ F(2, 12) = 56.0, p < 0.001. We reject H₀. At least one fertilizer produces a significantly different yield. Post-hoc testing (Tukey HSD) is needed to identify which specific pairs differ — in practice, B vs. A and B vs. C are the likely drivers.

The ANOVA Table Explained

Statistical software presents ANOVA results as a table. Knowing what each column represents means you can read any ANOVA output — from SPSS, R, Python, or a published paper — without confusion.

Source	SS	df	MS	F	p-value
Between Groups	280	2	140.00	56.00	< 0.001
Within Groups	30	12	2.50	—	—
Total	310	14	—	—	—

Reading the table: the Source column names where the variation comes from. SS is the sum of squared deviations. df is degrees of freedom. MS (Mean Square) is SS ÷ df. The F column only has a value for Between Groups because F is defined as the ratio MSB ÷ MSW. The p-value directly gives the probability of F ≥ 56.0 under H₀.

🧮 One-Way ANOVA Calculator

Enter comma-separated values for each group. The calculator computes SSB, SSW, F, and a decision at α = 0.05.

Group 1 data (comma-separated)

Group 2 data (comma-separated)

Group 3 data (comma-separated)

Group 4 data (optional)

How to Interpret ANOVA Results

A significant F-statistic is the beginning, not the end. Three things need attention after seeing p < 0.05.

Step 1: Check the P-Value Against Alpha

If p < α (usually 0.05), reject H₀. If p ≥ α, fail to reject H₀. Note that "fail to reject" is not the same as "the groups are definitely equal" — you simply lack sufficient evidence to declare a difference.

Step 2: Run Post-Hoc Tests

ANOVA only flags that something is different. Post-hoc tests make pairwise comparisons while controlling the experiment-wise error rate. Three common options are shown below.

Post-Hoc Test	Best For	Conservatism
Tukey HSD	All pairwise comparisons, equal group sizes	Moderate
Bonferroni correction	Planned comparisons, any group sizes	More conservative
Scheffé test	Complex contrasts, unequal group sizes	Most conservative
Games-Howell	Unequal variances or group sizes	Moderate

Step 3: Compute Effect Size (η²)

A statistically significant result says the groups differ. Effect size says how much they differ. Eta squared (η²) answers this question in practical terms.

Effect Size for ANOVA

η² = SSB ÷ SST

0.01 = small effect 0.06 = medium effect 0.14 = large effect

For the fertilizer example: η² = 280 ÷ 310 = 0.90. Fertilizer type accounts for 90% of the total variation in yield — a very large practical effect. These benchmarks come from Cohen's (1988) framework, referenced widely in the effect size literature and taught at institutions including University of Victoria's statistical computing resources.

Understanding P-Values in ANOVA

The p-value in ANOVA answers: if all group means were truly equal (H₀ true), what is the probability of observing an F-statistic this large or larger purely by chance? A small p-value (below α) means that chance alone is an unlikely explanation, so we reject H₀.

⚠️

Common P-Value Misreading

The p-value is not the probability that H₀ is true. It is the probability of the observed data (or more extreme data) given that H₀ is true. This subtle but critical distinction is documented by the American Statistical Association's 2016 statement on p-values. Always pair p-values with effect sizes for a complete picture.

The p-value and effect size tell different stories. A p-value below 0.05 means the result is statistically significant — but with a very large sample, even a tiny, practically meaningless difference can produce p < 0.001. Effect size (η²) tells you whether the difference matters in the real world. Report both. This guidance is consistent with recommendations from the APA Publication Manual and most major scientific journals.

ANOVA Assumptions

ANOVA produces valid results only when four conditions hold. Researchers at UCLA's Statistical Consulting Group list these assumptions as standard practice for any ANOVA analysis.

Independence of Observations

Each data point must come from a different, unrelated subject or unit. If the same subject appears in multiple groups, use repeated measures ANOVA instead. Violation here inflates Type I error seriously.

Normality Within Groups

The dependent variable should be approximately normally distributed within each group. With n ≥ 30 per group, the central limit theorem makes ANOVA fairly robust to non-normality. For smaller samples, check with a Shapiro-Wilk test. Severe skewness with small samples warrants the Kruskal-Wallis alternative.

Homogeneity of Variance (Homoscedasticity)

All groups should have roughly equal variances. Test this with Levene's test before running ANOVA. If this assumption fails, use Welch's ANOVA, which does not assume equal variances and is available in R (oneway.test()), Python (scipy), and SPSS.

Random Sampling

Data should come from a random sample drawn from the population. This assumption is about study design. If subjects were not randomly sampled or randomly assigned to groups, the generalizability of results is limited regardless of statistical significance.

What to Do When an Assumption Fails

Violated Assumption	Alternative Test	When to Use It
Normality	Kruskal-Wallis test	Non-normal distributions, ordinal data, small samples
Homogeneity of variance	Welch's ANOVA	Unequal group variances, large Levene's test p-value
Independence	Repeated Measures ANOVA	Same subjects measured in multiple conditions

ANOVA vs t-Test: When to Use Each

The t-test and ANOVA both compare group means. The choice between them is mostly a question of how many groups you have — and what happens to your error rate if you pick the wrong one.

Feature	t-Test	ANOVA
Number of groups	Exactly 2	3 or more
Test statistic	t-statistic	F-statistic
Type I error control	Fine for 2 groups, inflates with 3+	Controls experiment-wise error at α
Post-hoc tests needed?	No	Yes, if significant
Independent variables	1	1 (one-way) or 2+ (two-way)
Special case relationship	t² = F when k = 2	ANOVA with k=2 gives identical result to t-test

The last row is worth dwelling on. When ANOVA is run with only two groups, F = t². They are mathematically equivalent. The distinction matters only when you have three or more groups — then ANOVA is the correct choice. You can read more about the t-test variants in the one-sample t-test, two-sample t-test, and paired samples t-test guides.

Advantages and Limitations of ANOVA

What ANOVA Does Well

Tests multiple groups in a single procedure, controlling Type I error at exactly α
Flexible across designs — one-way, two-way, repeated measures, MANOVA
Detects interaction effects when two factors are studied together (two-way ANOVA)
Robust to mild violations of normality when group sizes are large and equal
Well-supported in every major statistics package (SPSS, R, Python, SAS, Stata)

Where ANOVA Falls Short

Only flags that at least one group differs — post-hoc tests are always required for specifics
Sensitive to outliers, particularly in small samples
Requires the homogeneity of variance assumption (though Welch's ANOVA relaxes this)
Not designed for non-normal, ordinal, or count data without transformation
With very large samples, statistically significant results may have negligible practical size

Real-World Applications of ANOVA

ANOVA is not a classroom exercise. These four domains use it routinely.

🧪

Clinical Trials

Three treatment arms — placebo, low-dose, high-dose — are compared on a continuous outcome like blood pressure reduction. ANOVA tests whether any dose produces a different result before the trial moves to regulatory submission.

📚

Education Research

Researchers compare exam scores across three teaching formats: in-person lecture, flipped classroom, and fully online. ANOVA determines whether format affects outcomes across diverse student populations.

📈

Marketing A/B/C Testing

Three ad creative variations run across separate market segments. ANOVA tests whether click-through rate differs across creatives, guiding budget allocation before a full campaign launch.

🏭

Manufacturing Quality Control

Four production lines manufacture the same component. ANOVA tests whether defect rates differ across lines, pointing quality teams toward which lines need process adjustment.

Case Study: Clinical Trials

Real-World Application

Drug Effectiveness: Placebo vs. Drug A vs. Drug B

Sixty patients are randomly assigned to three groups (20 per group): placebo, Drug A, and Drug B. After eight weeks, systolic blood pressure reduction (mmHg) is measured. One-way ANOVA tests H₀: μ_placebo = μ_A = μ_B. A significant result (say, F(2, 57) = 8.4, p = 0.001) indicates that drug assignment affected outcomes. Tukey HSD post-hoc tests would then reveal whether both drugs outperformed placebo, or whether one drug outperformed the other.

Note: The FDA and EMA require ANOVA or equivalent analyses in most Phase II and Phase III clinical trial submissions. The International Council for Harmonisation (ICH) guideline ICH E9(R1) covers the statistical analysis framework for clinical studies.

Running ANOVA in Python, R, and SPSS

Python — scipy.stats and pingouin

from scipy import stats
import numpy as np

# Our three fertilizer groups
A = np.array([20, 22, 19, 21, 23])
B = np.array([28, 30, 27, 29, 31])
C = np.array([18, 20, 17, 19, 21])

# One-way ANOVA
F, p = stats.f_oneway(A, B, C)
print(f"F = {F:.2f}, p = {p:.4f}")
# Output: F = 56.00, p = 0.0000

# Check Levene's test for equal variances first
W, lp = stats.levene(A, B, C)
print(f"Levene's W = {W:.3f}, p = {lp:.3f}")
        

R — aov() and TukeyHSD()

# Build a data frame
yield <- c(20,22,19,21,23, 28,30,27,29,31, 18,20,17,19,21)
group <- rep(c("A", "B", "C"), each = 5)

# Run one-way ANOVA
model <- aov(yield ~ group)
summary(model)

# Tukey HSD post-hoc test
TukeyHSD(model)

# Check assumption: Levene's test (requires car package)
# car::leveneTest(yield ~ group)
        

For SPSS: Analyze → Compare Means → One-Way ANOVA → move the dependent variable and factor, then click Post Hoc and select Tukey. The ANOVA table, effect size, and post-hoc results all appear in the output viewer.

Common ANOVA Mistakes

#	Mistake	Consequence	Fix
1	Skipping assumption checks	Invalid F-statistic, inflated error rate	Always run Levene's test and check normality before ANOVA
2	Stopping at significant F	No idea which groups actually differ	Run Tukey HSD or Bonferroni post-hoc tests
3	Using ANOVA with only 2 groups	Technically valid, but overly complex	Use a t-test for two groups — same result, simpler interpretation
4	Ignoring effect size	Large-sample studies report trivial differences as significant	Always report η² alongside p-value
5	Treating non-independent data as independent	Artificially inflated sample size, wrong F	Use repeated measures ANOVA for matched or longitudinal data
6	Not checking for outliers	Outliers distort both SS calculations	Use boxplots and z-scores to flag outliers before analysis

ANOVA Entity and Formula Glossary

This reference table covers every major term in ANOVA analysis. It is formatted for direct lookup — use it alongside any ANOVA output you are trying to read.

Term	Symbol	Definition	Formula / Benchmark
ANOVA	—	Statistical method for comparing means of 3+ groups by partitioning total variance	Analysis of Variance
F-statistic	F	Ratio of between-group variance to within-group variance; the ANOVA test statistic	F = MSB ÷ MSW
Sum of Squares Between	SSB	Total squared deviations of group means from the grand mean, weighted by group size	Σ nᵢ(x̄ᵢ − x̄)²
Sum of Squares Within	SSW	Total squared deviations of individual observations from their group mean	Σ Σ (xᵢⱼ − x̄ᵢ)²
Sum of Squares Total	SST	Total variation in the data; equals SSB + SSW	SSB + SSW
Mean Square Between	MSB	Between-group variance per degree of freedom	SSB ÷ (k − 1)
Mean Square Within	MSW	Within-group variance per degree of freedom; also called Mean Square Error	SSW ÷ (N − k)
Degrees of Freedom (Between)	df_B	Number of groups minus one	k − 1
Degrees of Freedom (Within)	df_W	Total observations minus number of groups	N − k
P-value	p	Probability of observing F this large or larger under H₀; not the probability H₀ is true	p < α → reject H₀
Null Hypothesis	H₀	Assumption that all k group population means are equal	μ₁ = μ₂ = ... = μₖ
Alternative Hypothesis	H₁	At least one group mean differs from the others	At least one μᵢ ≠ μⱼ
Effect Size	η²	Proportion of total variance explained by the group factor (eta squared)	η² = SSB ÷ SST
Eta Squared Benchmarks	η²	Magnitude guidelines from Cohen (1988): small, medium, large	0.01 / 0.06 / 0.14
Homogeneity of Variance	—	Assumption that group variances are approximately equal; tested with Levene's test	Levene's p > 0.05 = satisfied

Frequently Asked Questions About ANOVA

ANOVA (Analysis of Variance) is a statistical test that compares the means of three or more groups simultaneously. It divides total variation into between-group variation (caused by the factor being studied) and within-group variation (random scatter). The ratio of these two quantities — the F-statistic — tells you whether group differences are larger than random chance would produce.

Yes, technically. When ANOVA runs with two groups, F = t², so it gives the same result as an independent samples t-test. But ANOVA's value is specifically in handling three or more groups without inflating the Type I error rate. For two groups, use the two-sample t-test — it is simpler and produces identical conclusions.

A significant ANOVA (p < α) tells you that at least one group mean is statistically different from the others. It does not tell you which groups differ — that requires post-hoc tests (Tukey HSD, Bonferroni, Scheffé). Think of ANOVA as a smoke alarm: it tells you there is a fire somewhere, but you still need to find the room.

ANOVA requires: (1) independence of observations — each data point is from a separate subject or unit; (2) approximate normality within groups — check with Shapiro-Wilk for small samples; (3) homogeneity of variance — all groups have similar variances, tested with Levene's test; (4) random sampling from the population. Violations of homogeneity can be addressed with Welch's ANOVA, and non-normality with the Kruskal-Wallis test.

There is no universally "good" F-value in isolation. You interpret F by converting it to a p-value using the degrees of freedom for your specific dataset. An F of 3.0 might be significant or not depending on your sample size and number of groups. Focus on p and effect size (η²), not on whether F sounds large or small.

The Kruskal-Wallis test is the non-parametric equivalent of one-way ANOVA. Use it when the normality assumption is clearly violated, especially with small samples (n < 15 per group) or ordinal data. It ranks all observations combined and tests whether the rank distributions differ across groups. For repeated measures, the Friedman test serves a similar role.

Eta squared (η² = SSB ÷ SST) measures practical significance: the proportion of total variance that the group factor explains. A study with 500 subjects might detect a statistically significant ANOVA result where η² = 0.01, meaning the factor explains only 1% of variance — probably not practically meaningful. Always report effect size alongside p-values. Benchmarks (small = 0.01, medium = 0.06, large = 0.14) come from Cohen's 1988 framework.

ANOVA connects to a cluster of techniques. Understanding them together gives you a complete picture of inferential statistics for group comparisons.

Hypothesis Testing — The parent framework that gives ANOVA its structure: null hypothesis, Type I error, significance level
One-Sample t-Test — Comparing a single group mean to a known value; the simplest hypothesis test
Two-Sample t-Test — Comparing exactly two independent group means; use ANOVA when you have three or more
Paired Samples t-Test — For two related measurements on the same subjects; repeated measures ANOVA generalizes this to 3+ time points
F-Table (Critical Values) — The reference table for finding the critical F-value given df and α
Confidence Intervals — Complements p-values by estimating the range of plausible parameter values
Simple Linear Regression — ANOVA is actually a special case of the general linear model; regression can reproduce ANOVA results with dummy-coded predictors
Normal Distribution — The theoretical basis for the normality assumption in ANOVA
Z-Score — Standardization concept that connects to how ANOVA treats within-group variation
Sampling Distributions — Why the F-statistic follows an F-distribution under H₀

📚

Authoritative External Sources

For deeper theoretical treatment, see: the NIST/SEMATECH e-Handbook — ANOVA chapter; UCLA Statistical Consulting ANOVA seminar; and Khan Academy's ANOVA library. For clinical trial applications, refer to ICH E9(R1).