What Is Effect Size? (Definition)
When two groups are compared — say, a treatment group and a control group — a p-value tells you whether the difference between them is statistically distinguishable from zero. Effect size tells you how large that difference is in standardized units. A study with n = 10,000 can produce p = 0.001 for a difference so small it has no practical meaning. Effect size catches that.
The American Psychological Association (APA), the American Statistical Association (ASA), and most major journals now require reporting effect sizes alongside p-values. Jacob Cohen, who formalized many of the measures used today, argued in his landmark 1988 textbook Statistical Power Analysis for the Behavioral Sciences that effect size is the most fundamental quantity in empirical research. His three-level classification — small, medium, large — remains the dominant interpretive framework across psychology, education, and medicine.
- Effect size meaning: Quantifies how large or practically important a result is, beyond statistical significance
- Not affected by sample size: Unlike the p-value, effect size is a property of the population, not of n
- Required by APA (2010): The APA Publication Manual mandates reporting effect sizes in all empirical research
- Cohen's benchmarks: Small = 0.20, Medium = 0.50, Large = 0.80 (for Cohen's d)
- Standardized: Effect sizes are unit-free, so they can be compared across studies and disciplines
- Meta-analysis: Effect sizes are the raw material of meta-analysis — they allow combining evidence across studies
Effect Size vs P-Value: Why Magnitude Matters
Statistical significance and practical significance are different things. A p-value answers one question: given the sample size, could this result have occurred by chance if there were no true effect? Effect size answers a completely separate question: how large is the effect?
With n = 100,000, even a difference of 0.001 IQ points can produce p < 0.05. That difference is real — but meaningless. Effect size prevents this misinterpretation by measuring magnitude independently of sample size.
| Concept | P-value | Effect Size |
|---|---|---|
| Question answered | Is the effect real? | How large is the effect? |
| Affected by sample size | Yes — larger n → smaller p | No — independent of n |
| Tells you practical importance | No | Yes |
| Required for meta-analysis | No | Yes |
| APA-required reporting | Yes | Yes |
| Measures significance | Statistical significance | Practical significance |
The two measures are not interchangeable — they work together. A result can be statistically significant with a tiny effect size (large sample, negligible difference), or statistically non-significant with a large effect size (small sample, real-but-undetected effect). Good research reports and interprets both.
The Complete Effect Size Formula Library
Different study designs require different effect size measures. The table below maps each design to its recommended measure. Detailed formulas follow for each.
| Study Design | Recommended Measure | Symbol |
|---|---|---|
| Two independent groups (t-test) | Cohen's d or Hedges' g | d, g |
| Two groups, small samples (n < 20) | Hedges' g (bias-corrected) | g |
| Control group SD differs from treatment | Glass's delta | Δ |
| ANOVA (variance explained) | Eta squared or Omega squared | η², ω² |
| ANOVA (population estimate) | Omega squared (preferred) | ω² |
| Correlation / regression | Pearson's r or r² | r, r² |
| Chi-square (2×2 table) | Phi coefficient | φ |
| Chi-square (larger tables) | Cramer's V | V |
Cohen's d — Standardized Mean Difference
Cohen's d is the most widely used effect size measure. It expresses the difference between two group means in units of the pooled standard deviation. The result is unit-free, allowing comparisons across studies measuring different things.
M₁ = mean of Group 1
M₂ = mean of Group 2
SDpooled = √[(SD₁² + SD₂²) / 2]
The pooled standard deviation assumes the two groups have roughly equal variance. If standard deviations differ substantially, consider Glass's delta instead. The sign of d tells you the direction of the effect (which group scored higher); interpretation tables use the absolute value.
Hedges' g — Bias-Corrected Estimate
Hedges' g applies a correction factor to Cohen's d for small sample sizes. When n₁ + n₂ is below about 20, Cohen's d overestimates the true population effect; Hedges' g corrects for this bias.
d = Cohen's d
n₁, n₂ = group sample sizes
Hedges' g is interpreted using the same benchmarks as Cohen's d. For large samples the two measures converge; the difference only matters when total n is below 50.
Glass's Delta — Control Group Reference
Glass's delta uses only the control group's standard deviation in the denominator. It is the preferred measure when the experimental treatment is expected to change within-group variability — for example, in clinical trials where the intervention affects not just the mean but also consistency of response.
SDcontrol = standard deviation of control group only
Eta Squared (η²) — ANOVA Variance Explained
Eta squared quantifies the proportion of total variance in the dependent variable that is explained by the independent variable in an ANOVA. It ranges from 0 to 1 and can be interpreted like an R² from regression.
SSeffect = sum of squares for the effect
SStotal = total sum of squares
Eta squared tends to overestimate the population effect in small samples because it is computed from sample sums of squares with no bias correction. For that reason, omega squared is preferred when generalizing beyond the sample.
Omega Squared (ω²) — Less Biased ANOVA Estimate
Omega squared corrects for the upward bias in eta squared, producing a more accurate estimate of the proportion of variance explained in the population. The formula adjusts for degrees of freedom and mean square error.
dfeffect = degrees of freedom for the factor
MSerror = mean square error
Pearson's r — Correlation Effect Size
When your research involves a correlation or regression rather than a group comparison, Pearson's r is the effect size. It ranges from −1 to +1, with larger absolute values indicating stronger effects. Squaring r gives r², the proportion of variance shared between the two variables.
x̄, ȳ = means of variables X and Y
r² = variance explained
Effect Size Interpretation Tables
Cohen's 1988 benchmarks remain the standard reference across disciplines. They were calibrated on research in psychology and behavioral science. In fields like medicine and educational research, smaller effects are often clinically meaningful — so always interpret effect sizes in context, not just against these thresholds.
Cohen's d Interpretation
| Cohen's d | Interpretation | Overlap (%) | Example Context |
|---|---|---|---|
| < 0.20 | Negligible | ~92% | Barely detectable difference |
| 0.20 | Small Effect | ~85% | Height difference: males vs. females in same sample |
| 0.50 | Medium Effect | ~67% | Difference between IQ scores of groups in different jobs |
| 0.80 | Large Effect | ~53% | Difference between IQ of college vs. non-college students |
| ≥ 1.20 | Very Large Effect | < 45% | Effect of a highly effective educational intervention |
Pearson's r Interpretation
| Pearson's r (absolute value) | Interpretation | Variance Explained (r²) |
|---|---|---|
| 0.10 | Small Effect | 1% |
| 0.30 | Medium Effect | 9% |
| 0.50 | Large Effect | 25% |
| ≥ 0.70 | Very Large Effect | ≥ 49% |
Eta Squared and Omega Squared (ANOVA)
| η² or ω² | Interpretation | Equivalent Cohen's f |
|---|---|---|
| 0.01 | Small Effect | f = 0.10 |
| 0.06 | Medium Effect | f = 0.25 |
| 0.14 | Large Effect | f = 0.40 |
John Hattie's landmark educational meta-analysis (Visible Learning, 2009) found that the average effect of schooling on student achievement is d = 0.40 — what Cohen called "medium." In that context, an intervention with d = 0.40 is merely average, not impressive. Always compare effect sizes to those of similar interventions in your field.
How to Calculate Effect Size (Step-by-Step)
Identify Your Study Design
Are you comparing two group means (use Cohen's d), analyzing variance across multiple groups (use η² or ω²), or examining a correlation (use Pearson's r)? The design determines the formula.
Gather the Required Statistics
For Cohen's d: group means (M₁, M₂), standard deviations (SD₁, SD₂), and sample sizes (n₁, n₂). For ANOVA: the ANOVA summary table with SS and MS values. For Pearson's r: the raw data or covariance and standard deviations.
Compute the Pooled Standard Deviation (for d)
SDpooled = √[(SD₁² + SD₂²) / 2] when group sizes are equal. When n₁ ≠ n₂, use the weighted formula: SDpooled = √[((n₁ − 1)SD₁² + (n₂ − 1)SD₂²) / (n₁ + n₂ − 2)].
Apply the Formula
Divide the mean difference by the pooled SD for Cohen's d, or compute SSeffect/SStotal for eta squared. Use the calculator below to verify your arithmetic.
Apply Hedges' Correction if Needed
If your combined sample size is below 50, multiply Cohen's d by the correction factor: (1 − 3/(4(n₁ + n₂) − 9)) to obtain Hedges' g. For larger samples, the correction is negligible.
Interpret in Context and Report
Compare to Cohen's benchmarks and to typical effect sizes in your field. Report as: "Cohen's d = 0.54, indicating a medium effect" or "η² = 0.09, indicating that the independent variable explained 9% of variance in the outcome."
Interactive Effect Size Calculator (Cohen's d & Hedges' g)
Enter the summary statistics for two groups. The calculator computes Cohen's d, Hedges' g (bias-corrected), the pooled standard deviation, and automatically classifies the magnitude based on Cohen's benchmarks.
Effect Size Calculator — Cohen's d & Hedges' g
Enter group means, standard deviations, and sample sizes below.
Worked Examples Across Research Designs
Example 1 — Two-Group Comparison (Cohen's d)
Problem: Researchers test whether a memory training program improves recall scores. The training group (n₁ = 25) scores M₁ = 78 with SD₁ = 10. The control group (n₂ = 25) scores M₂ = 70 with SD₂ = 12. Calculate Cohen's d and Hedges' g.
Compute the pooled SD: SDpooled = √[(10² + 12²) / 2] = √[(100 + 144) / 2] = √122 = 11.05
Calculate Cohen's d: d = (78 − 70) / 11.05 = 8 / 11.05 = 0.724
Apply Hedges' correction: Correction = 1 − 3/(4(25+25) − 9) = 1 − 3/191 = 0.9843
g = 0.724 × 0.9843 = 0.713
Interpret: d = 0.724 falls between 0.50 (medium) and 0.80 (large). By convention, this is a medium-to-large effect.
✅ Result: Cohen's d = 0.72, Hedges' g = 0.71. The memory training produced a medium-to-large effect on recall scores. The training group scored about 0.72 pooled standard deviations higher than the control group.
Example 2 — One-Way ANOVA (Eta Squared)
Problem: A study compares exam performance across three teaching methods (lecture, flipped classroom, online). The ANOVA table shows SSbetween = 450 and SStotal = 1,800. Calculate η² and ω² (with MSerror = 75, dfbetween = 2).
Calculate Eta squared: η² = SSeffect / SStotal = 450 / 1,800 = 0.25
Calculate Omega squared:
ω² = (450 − 2 × 75) / (1,800 + 75) = (450 − 150) / 1,875 = 300 / 1,875 = 0.16
Interpret: η² = 0.25 far exceeds the large threshold of 0.14. ω² = 0.16, the less biased estimate, still indicates a large effect.
✅ Result: η² = 0.25, ω² = 0.16. Teaching method explains approximately 16–25% of the variance in exam scores — a large effect. The ω² = 0.16 is the preferred report value as it corrects for sample bias.
Example 3 — Pearson's r (Correlation Effect Size)
Problem: A study finds r = −0.42 between hours of sleep and number of errors on a cognitive task. What is the effect size and how much variance is explained?
Effect size: |r| = 0.42 falls between the medium threshold (0.30) and large threshold (0.50).
Variance explained: r² = 0.42² = 0.176. Sleep explains about 17.6% of the variance in cognitive errors.
✅ Result: r = −0.42 indicates a medium-to-large negative correlation. More sleep is associated with fewer errors. The relationship accounts for approximately 18% of variance in errors — practically meaningful in a cognitive health context.
Example 4 — Clinical Trial (Cohen's d in Medicine)
Problem: A blood pressure drug trial finds the treatment group has a mean reduction of 12 mmHg (SD = 15), while the placebo group shows 5 mmHg (SD = 14). n = 200 per group. The p-value is 0.0003. How large is the effect?
Pooled SD: √[(15² + 14²)/2] = √[(225 + 196)/2] = √210.5 = 14.51
Cohen's d: d = (12 − 5) / 14.51 = 7 / 14.51 = 0.48
Context: In clinical cardiology, a mean difference of 7 mmHg in systolic BP is considered clinically meaningful, even though d = 0.48 is technically a "medium" effect by Cohen's benchmarks. This illustrates why domain context matters.
✅ Result: Cohen's d = 0.48 (medium effect). The drug is both statistically significant (p = 0.0003) and clinically meaningful (7 mmHg reduction). Reporting effect size alongside p-value provides the complete picture for clinical decision-making.
Visualizing Effect Size Magnitude
One of the most intuitive ways to grasp what a Cohen's d value means is to think about the overlap between two distributions. A d = 0 means 100% overlap — the groups are identical. As d grows, the distributions separate and overlap decreases.
Distribution Overlap by Effect Size
Small
Medium
Large
Very Large
Bars represent approximate distribution spread. At d = 0.80, the average person in Group 1 scores above 79% of people in Group 2.
Real-World Applications of Effect Size
Clinical Research
Drug trials report effect sizes to distinguish statistical significance (driven by large n) from clinical significance. A d = 0.20 may be trivially small for pain reduction but clinically important for mortality risk.
Psychology
The replication crisis prompted psychology to mandate effect size reporting. Many classic effects (ego depletion, social priming) shrank dramatically when replication studies computed more accurate effect sizes.
Education
John Hattie's Visible Learning meta-analysis synthesized 1,400+ studies using effect sizes. Findings like d = 0.73 for feedback and d = 0.52 for cooperative learning guide evidence-based teaching practice.
A/B Testing
Product and marketing teams report effect sizes (often Cohen's d or relative risk) to prioritize which experiments to ship. An A/B test with p = 0.04 but d = 0.02 rarely justifies a full rollout.
Sports Science
Performance researchers use magnitude-based inference anchored to effect size, not just p-values. A d = 0.20 improvement in sprint time can meaningfully separate athletes at elite levels.
Meta-Analysis
Meta-analysts combine effect sizes from dozens of studies to estimate the overall effect of an intervention. Without a standardized effect size, studies measuring outcomes in different units cannot be meaningfully pooled.
John Hattie's Effect Size in Education
John Hattie's Visible Learning project, now spanning over 1,800 meta-analyses and 300 million students, is the largest synthesis of educational research ever conducted. Hattie uses Cohen's d as the universal currency for comparing educational interventions.
Hattie Effect Size Chart — Key Findings
What works best in education?
Hattie's "hinge point" is d = 0.40 — the average effect of schooling itself. Interventions above this threshold are considered worth adopting; those below are likely no better than standard teaching. The findings challenge many conventional assumptions.
| Educational Intervention | Hattie's Effect Size (d) | Rank (approx.) |
|---|---|---|
| Collective teacher efficacy | 1.57 | Top 5 |
| Self-reported grades (student expectations) | 1.33 | Top 5 |
| Formative evaluation / feedback | 0.73 | High |
| Direct instruction | 0.60 | Above average |
| Cooperative learning | 0.52 | Above average |
| Problem-based learning | 0.35 | Below hinge point |
| Class size reduction | 0.21 | Small effect |
| Homework (secondary) | 0.29 | Small-medium |
Hattie's work illustrates both the power and the limitations of effect size benchmarks. His classification uses d = 0.40 as "the hinge point" — meaning interventions with d < 0.40 may not justify their cost — which differs from Cohen's original small/medium/large framework. The right benchmark depends on the question you're asking.
Effect Size Symbols and Notation
Each effect size measure uses a specific symbol. Knowing the correct notation matters for reading journal articles and writing up results correctly.
| Symbol | Name | Used For | Range |
|---|---|---|---|
| d | Cohen's d | Two-group mean difference | −∞ to +∞ (absolute value for magnitude) |
| g | Hedges' g | Bias-corrected mean difference | Same as d |
| Δ | Glass's delta | Mean diff using control SD | Same as d |
| r | Pearson's r | Correlation / regression | −1 to +1 |
| r² | Coefficient of determination | Variance explained (regression) | 0 to 1 |
| η² | Eta squared | ANOVA variance explained | 0 to 1 |
| ω² | Omega squared | ANOVA, less biased than η² | 0 to 1 |
| φ | Phi coefficient | Chi-square 2×2 table | 0 to 1 |
| V | Cramer's V | Chi-square larger tables | 0 to 1 |
| f | Cohen's f | ANOVA, related to η² | 0 to +∞ |
Frequently Asked Questions About Effect Size
Effect size is a standardized numerical measure of the magnitude or practical importance of a statistical result. It answers "how large is this effect?" rather than the yes/no question of statistical significance. Common measures include Cohen's d for mean differences and Eta squared for ANOVA results. Effect size is independent of sample size, making it a more stable indicator of practical importance than the p-value.
Cohen's (1988) benchmarks define small = 0.20, medium = 0.50, and large = 0.80 for Cohen's d. However, "good" is context-dependent. In education, Hattie's work shows the average intervention produces d = 0.40, so that threshold is more meaningful for comparing teaching methods. In clinical medicine, a d = 0.20 may be highly clinically significant if the outcome is mortality. Always compare to published effect sizes in your specific field.
A small effect size (Cohen's d ≈ 0.20) means the two groups' distributions overlap substantially — about 85% overlap. The difference exists but is subtle. In everyday terms, it is roughly the difference in height between 15- and 16-year-old girls in the same population. Small effects can still be practically important: a small reduction in mortality risk, applied to millions of people, has enormous population-level consequences.
A large effect size (Cohen's d ≥ 0.80) means the groups are substantially separated — only about 53% distribution overlap. The average person in the higher-scoring group outperforms approximately 79% of people in the lower-scoring group. An example: the difference in IQ between college graduates and non-graduates in the general population is approximately d = 1.0 — a very large effect that is easily observed without statistical testing.
Statistical significance (p-value) measures whether an effect is detectable given your sample size. Effect size measures how large the effect is, independent of sample size. A result can be statistically significant with a tiny effect size (when n is very large), or statistically non-significant with a large effect size (when n is very small). The p-value and effect size answer different questions — responsible research reports both.
A correctly computed effect size is not directly affected by sample size — that is its primary advantage over the p-value. Whether you study 20 or 2,000 people, if the true population means and standard deviations are the same, Cohen's d should produce the same estimate. However, small samples produce less precise estimates, so confidence intervals around effect sizes are wider when n is small. Hedges' g corrects for a small upward bias that Cohen's d shows in small samples.
Related Statistical Concepts
Effect size connects to several other core statistical ideas. Understanding the relationships between these concepts deepens your ability to design studies, interpret results, and evaluate published research.
Hypothesis Testing
Effect size is reported alongside p-values in hypothesis tests. The test determines significance; effect size determines magnitude. Both are needed for a complete result.
P-values
P-values and effect sizes answer different questions. A p-value is affected by sample size; effect size is not. Large n can make trivially small effects statistically significant.
Confidence Intervals
Confidence intervals around effect sizes provide more information than point estimates alone. A CI for Cohen's d shows the range of plausible true effects given sampling variability.
Statistical Power
Statistical power — the probability of detecting a true effect — is directly tied to effect size. Larger effect sizes are easier to detect; power analysis uses an expected effect size to determine the required sample size.
Pearson Correlation
Pearson's r is both a correlation coefficient and an effect size. Its square (r²) tells you what proportion of variance in one variable is explained by another.
ANOVA
ANOVA tests whether group means differ significantly. Eta squared and omega squared are the matching effect size measures, quantifying what proportion of total variance the grouping variable explains.
This guide is part of the Statistics Fundamentals learning library. Explore related topics: hypothesis testing examples, confidence intervals for means, Type I and Type II errors, null and alternative hypotheses, and our full statistics calculators library.