Effect Size Practical Significance Cohen's d 25 min read June 12, 2026
BY: Statistics Fundamentals Team
Reviewed By: Minsa A (Senior Statistics Editor)

Effect Size: The Complete Guide to Measurement & Interpretation

A drug trial finds p = 0.001 — statistically significant. But did the drug actually help patients in any meaningful way? That question requires a different tool: effect size. Effect size measures how large a result is, not just whether it exists. It separates statistical noise from practical importance, and the American Psychological Association now requires it in all published research.

This guide covers every major effect size measure — Cohen's d, Hedges' g, Glass's delta, Pearson's r, Eta squared, Omega squared, and Cramer's V — with formulas, interpretation benchmarks, fully worked examples, and an interactive Cohen's d calculator you can use immediately.

What You'll Learn
  • ✓ What effect size means and why it matters more than the p-value alone
  • ✓ Every major formula: Cohen's d, Hedges' g, Eta squared, Omega squared, Pearson r, Cramer's V
  • ✓ Step-by-step calculation with four fully worked examples
  • ✓ Interpretation tables for small, medium, and large effects
  • ✓ An interactive effect size calculator (Cohen's d & Hedges' g)
  • ✓ Effect size vs p-value: when each one matters
  • ✓ Real-world applications across medicine, psychology, and education

What Is Effect Size? (Definition)

Definition — Effect Size
Effect size is a standardized, quantitative measure of the magnitude of a statistical result — how large, strong, or practically important an observed relationship or difference is. It answers the question "How much?" rather than the yes/no question answered by a p-value.
Effect Size = magnitude of an effect, independent of sample size

When two groups are compared — say, a treatment group and a control group — a p-value tells you whether the difference between them is statistically distinguishable from zero. Effect size tells you how large that difference is in standardized units. A study with n = 10,000 can produce p = 0.001 for a difference so small it has no practical meaning. Effect size catches that.

The American Psychological Association (APA), the American Statistical Association (ASA), and most major journals now require reporting effect sizes alongside p-values. Jacob Cohen, who formalized many of the measures used today, argued in his landmark 1988 textbook Statistical Power Analysis for the Behavioral Sciences that effect size is the most fundamental quantity in empirical research. His three-level classification — small, medium, large — remains the dominant interpretive framework across psychology, education, and medicine.

1988
Cohen formalizes effect size benchmarks
d = 0.50
Medium effect (Cohen's d benchmark)
η² = 0.06
Medium effect (ANOVA benchmark)
r = 0.30
Medium correlation effect size
⚡ Quick Reference — Effect Size Key Facts
  • Effect size meaning: Quantifies how large or practically important a result is, beyond statistical significance
  • Not affected by sample size: Unlike the p-value, effect size is a property of the population, not of n
  • Required by APA (2010): The APA Publication Manual mandates reporting effect sizes in all empirical research
  • Cohen's benchmarks: Small = 0.20, Medium = 0.50, Large = 0.80 (for Cohen's d)
  • Standardized: Effect sizes are unit-free, so they can be compared across studies and disciplines
  • Meta-analysis: Effect sizes are the raw material of meta-analysis — they allow combining evidence across studies

Effect Size vs P-Value: Why Magnitude Matters

Statistical significance and practical significance are different things. A p-value answers one question: given the sample size, could this result have occurred by chance if there were no true effect? Effect size answers a completely separate question: how large is the effect?

⚠️
The sample size problem with p-values

With n = 100,000, even a difference of 0.001 IQ points can produce p < 0.05. That difference is real — but meaningless. Effect size prevents this misinterpretation by measuring magnitude independently of sample size.

Concept P-value Effect Size
Question answeredIs the effect real?How large is the effect?
Affected by sample sizeYes — larger n → smaller pNo — independent of n
Tells you practical importanceNoYes
Required for meta-analysisNoYes
APA-required reportingYesYes
Measures significanceStatistical significancePractical significance

The two measures are not interchangeable — they work together. A result can be statistically significant with a tiny effect size (large sample, negligible difference), or statistically non-significant with a large effect size (small sample, real-but-undetected effect). Good research reports and interprets both.

Reference: Cohen, J. (1988). Statistical Power Analysis for the Behavioral Sciences (2nd ed.). Lawrence Erlbaum Associates. | Lakens, D. (2013). Calculating and reporting effect sizes to facilitate cumulative science. Frontiers in Psychology, 4, 863.

The Complete Effect Size Formula Library

Different study designs require different effect size measures. The table below maps each design to its recommended measure. Detailed formulas follow for each.

Study DesignRecommended MeasureSymbol
Two independent groups (t-test)Cohen's d or Hedges' gd, g
Two groups, small samples (n < 20)Hedges' g (bias-corrected)g
Control group SD differs from treatmentGlass's deltaΔ
ANOVA (variance explained)Eta squared or Omega squaredη², ω²
ANOVA (population estimate)Omega squared (preferred)ω²
Correlation / regressionPearson's r or r²r, r²
Chi-square (2×2 table)Phi coefficientφ
Chi-square (larger tables)Cramer's VV

Cohen's d — Standardized Mean Difference

Cohen's d is the most widely used effect size measure. It expresses the difference between two group means in units of the pooled standard deviation. The result is unit-free, allowing comparisons across studies measuring different things.

Cohen's d Formula
d = (M₁ − M₂) / SDpooled
M₁ = mean of Group 1 M₂ = mean of Group 2 SDpooled = √[(SD₁² + SD₂²) / 2]

The pooled standard deviation assumes the two groups have roughly equal variance. If standard deviations differ substantially, consider Glass's delta instead. The sign of d tells you the direction of the effect (which group scored higher); interpretation tables use the absolute value.

Hedges' g — Bias-Corrected Estimate

Hedges' g applies a correction factor to Cohen's d for small sample sizes. When n₁ + n₂ is below about 20, Cohen's d overestimates the true population effect; Hedges' g corrects for this bias.

Hedges' g Formula
g = d × (1 − 3 / (4(n₁ + n₂) − 9))
d = Cohen's d n₁, n₂ = group sample sizes

Hedges' g is interpreted using the same benchmarks as Cohen's d. For large samples the two measures converge; the difference only matters when total n is below 50.

Glass's Delta — Control Group Reference

Glass's delta uses only the control group's standard deviation in the denominator. It is the preferred measure when the experimental treatment is expected to change within-group variability — for example, in clinical trials where the intervention affects not just the mean but also consistency of response.

Glass's Delta Formula
Δ = (Mtreatment − Mcontrol) / SDcontrol
SDcontrol = standard deviation of control group only

Eta Squared (η²) — ANOVA Variance Explained

Eta squared quantifies the proportion of total variance in the dependent variable that is explained by the independent variable in an ANOVA. It ranges from 0 to 1 and can be interpreted like an R² from regression.

Eta Squared Formula (ANOVA)
η² = SSeffect / SStotal
SSeffect = sum of squares for the effect SStotal = total sum of squares

Eta squared tends to overestimate the population effect in small samples because it is computed from sample sums of squares with no bias correction. For that reason, omega squared is preferred when generalizing beyond the sample.

Omega Squared (ω²) — Less Biased ANOVA Estimate

Omega squared corrects for the upward bias in eta squared, producing a more accurate estimate of the proportion of variance explained in the population. The formula adjusts for degrees of freedom and mean square error.

Omega Squared Formula
ω² = (SSeffect − dfeffect × MSerror) / (SStotal + MSerror)
dfeffect = degrees of freedom for the factor MSerror = mean square error

Pearson's r — Correlation Effect Size

When your research involves a correlation or regression rather than a group comparison, Pearson's r is the effect size. It ranges from −1 to +1, with larger absolute values indicating stronger effects. Squaring r gives r², the proportion of variance shared between the two variables.

Pearson's r Formula
r = Σ[(xᵢ − x̄)(yᵢ − ȳ)] / √[Σ(xᵢ − x̄)² · Σ(yᵢ − ȳ)²]
x̄, ȳ = means of variables X and Y = variance explained
Source: Cohen, J. (1988). Statistical Power Analysis for the Behavioral Sciences, 2nd ed. | Fritz, C. O., Morris, P. E., & Richler, J. J. (2012). Effect size estimates: Current use, calculations, and interpretation. Journal of Experimental Psychology: General, 141(1), 2–18.

Effect Size Interpretation Tables

Cohen's 1988 benchmarks remain the standard reference across disciplines. They were calibrated on research in psychology and behavioral science. In fields like medicine and educational research, smaller effects are often clinically meaningful — so always interpret effect sizes in context, not just against these thresholds.

Cohen's d Interpretation

Cohen's dInterpretationOverlap (%)Example Context
< 0.20Negligible~92%Barely detectable difference
0.20Small Effect~85%Height difference: males vs. females in same sample
0.50Medium Effect~67%Difference between IQ scores of groups in different jobs
0.80Large Effect~53%Difference between IQ of college vs. non-college students
≥ 1.20Very Large Effect< 45%Effect of a highly effective educational intervention

Pearson's r Interpretation

Pearson's r (absolute value)InterpretationVariance Explained (r²)
0.10Small Effect1%
0.30Medium Effect9%
0.50Large Effect25%
≥ 0.70Very Large Effect≥ 49%

Eta Squared and Omega Squared (ANOVA)

η² or ω²InterpretationEquivalent Cohen's f
0.01Small Effectf = 0.10
0.06Medium Effectf = 0.25
0.14Large Effectf = 0.40
💡
Context changes what counts as "large"

John Hattie's landmark educational meta-analysis (Visible Learning, 2009) found that the average effect of schooling on student achievement is d = 0.40 — what Cohen called "medium." In that context, an intervention with d = 0.40 is merely average, not impressive. Always compare effect sizes to those of similar interventions in your field.

How to Calculate Effect Size (Step-by-Step)

1

Identify Your Study Design

Are you comparing two group means (use Cohen's d), analyzing variance across multiple groups (use η² or ω²), or examining a correlation (use Pearson's r)? The design determines the formula.

2

Gather the Required Statistics

For Cohen's d: group means (M₁, M₂), standard deviations (SD₁, SD₂), and sample sizes (n₁, n₂). For ANOVA: the ANOVA summary table with SS and MS values. For Pearson's r: the raw data or covariance and standard deviations.

3

Compute the Pooled Standard Deviation (for d)

SDpooled = √[(SD₁² + SD₂²) / 2] when group sizes are equal. When n₁ ≠ n₂, use the weighted formula: SDpooled = √[((n₁ − 1)SD₁² + (n₂ − 1)SD₂²) / (n₁ + n₂ − 2)].

4

Apply the Formula

Divide the mean difference by the pooled SD for Cohen's d, or compute SSeffect/SStotal for eta squared. Use the calculator below to verify your arithmetic.

5

Apply Hedges' Correction if Needed

If your combined sample size is below 50, multiply Cohen's d by the correction factor: (1 − 3/(4(n₁ + n₂) − 9)) to obtain Hedges' g. For larger samples, the correction is negligible.

6

Interpret in Context and Report

Compare to Cohen's benchmarks and to typical effect sizes in your field. Report as: "Cohen's d = 0.54, indicating a medium effect" or "η² = 0.09, indicating that the independent variable explained 9% of variance in the outcome."

Interactive Effect Size Calculator (Cohen's d & Hedges' g)

Enter the summary statistics for two groups. The calculator computes Cohen's d, Hedges' g (bias-corrected), the pooled standard deviation, and automatically classifies the magnitude based on Cohen's benchmarks.

Effect Size Calculator — Cohen's d & Hedges' g

Enter group means, standard deviations, and sample sizes below.

Group 1 (Experimental / Treatment)
Group 2 (Control / Comparison)
Cohen's d
Hedges' g (bias-corrected)

Worked Examples Across Research Designs

Example 1 — Two-Group Comparison (Cohen's d)

Worked Example 1 — Cohen's d

Problem: Researchers test whether a memory training program improves recall scores. The training group (n₁ = 25) scores M₁ = 78 with SD₁ = 10. The control group (n₂ = 25) scores M₂ = 70 with SD₂ = 12. Calculate Cohen's d and Hedges' g.

1

Compute the pooled SD: SDpooled = √[(10² + 12²) / 2] = √[(100 + 144) / 2] = √122 = 11.05

2

Calculate Cohen's d: d = (78 − 70) / 11.05 = 8 / 11.05 = 0.724

3

Apply Hedges' correction: Correction = 1 − 3/(4(25+25) − 9) = 1 − 3/191 = 0.9843
g = 0.724 × 0.9843 = 0.713

4

Interpret: d = 0.724 falls between 0.50 (medium) and 0.80 (large). By convention, this is a medium-to-large effect.

✅ Result: Cohen's d = 0.72, Hedges' g = 0.71. The memory training produced a medium-to-large effect on recall scores. The training group scored about 0.72 pooled standard deviations higher than the control group.

Example 2 — One-Way ANOVA (Eta Squared)

Worked Example 2 — Eta Squared

Problem: A study compares exam performance across three teaching methods (lecture, flipped classroom, online). The ANOVA table shows SSbetween = 450 and SStotal = 1,800. Calculate η² and ω² (with MSerror = 75, dfbetween = 2).

1

Calculate Eta squared: η² = SSeffect / SStotal = 450 / 1,800 = 0.25

2

Calculate Omega squared:
ω² = (450 − 2 × 75) / (1,800 + 75) = (450 − 150) / 1,875 = 300 / 1,875 = 0.16

3

Interpret: η² = 0.25 far exceeds the large threshold of 0.14. ω² = 0.16, the less biased estimate, still indicates a large effect.

✅ Result: η² = 0.25, ω² = 0.16. Teaching method explains approximately 16–25% of the variance in exam scores — a large effect. The ω² = 0.16 is the preferred report value as it corrects for sample bias.

Example 3 — Pearson's r (Correlation Effect Size)

Worked Example 3 — Pearson's r

Problem: A study finds r = −0.42 between hours of sleep and number of errors on a cognitive task. What is the effect size and how much variance is explained?

1

Effect size: |r| = 0.42 falls between the medium threshold (0.30) and large threshold (0.50).

2

Variance explained: r² = 0.42² = 0.176. Sleep explains about 17.6% of the variance in cognitive errors.

✅ Result: r = −0.42 indicates a medium-to-large negative correlation. More sleep is associated with fewer errors. The relationship accounts for approximately 18% of variance in errors — practically meaningful in a cognitive health context.

Example 4 — Clinical Trial (Cohen's d in Medicine)

Worked Example 4 — Clinical Effect Size

Problem: A blood pressure drug trial finds the treatment group has a mean reduction of 12 mmHg (SD = 15), while the placebo group shows 5 mmHg (SD = 14). n = 200 per group. The p-value is 0.0003. How large is the effect?

1

Pooled SD: √[(15² + 14²)/2] = √[(225 + 196)/2] = √210.5 = 14.51

2

Cohen's d: d = (12 − 5) / 14.51 = 7 / 14.51 = 0.48

3

Context: In clinical cardiology, a mean difference of 7 mmHg in systolic BP is considered clinically meaningful, even though d = 0.48 is technically a "medium" effect by Cohen's benchmarks. This illustrates why domain context matters.

✅ Result: Cohen's d = 0.48 (medium effect). The drug is both statistically significant (p = 0.0003) and clinically meaningful (7 mmHg reduction). Reporting effect size alongside p-value provides the complete picture for clinical decision-making.

Visualizing Effect Size Magnitude

One of the most intuitive ways to grasp what a Cohen's d value means is to think about the overlap between two distributions. A d = 0 means 100% overlap — the groups are identical. As d grows, the distributions separate and overlap decreases.

Distribution Overlap by Effect Size

Effect Size
Group 1 (purple) vs Group 2 (pink)
Overlap
d = 0.20
Small
~85%
d = 0.50
Medium
~67%
d = 0.80
Large
~53%
d = 1.20
Very Large
~40%

Bars represent approximate distribution spread. At d = 0.80, the average person in Group 1 scores above 79% of people in Group 2.

Real-World Applications of Effect Size

🏥

Clinical Research

Drug trials report effect sizes to distinguish statistical significance (driven by large n) from clinical significance. A d = 0.20 may be trivially small for pain reduction but clinically important for mortality risk.

🧠

Psychology

The replication crisis prompted psychology to mandate effect size reporting. Many classic effects (ego depletion, social priming) shrank dramatically when replication studies computed more accurate effect sizes.

📚

Education

John Hattie's Visible Learning meta-analysis synthesized 1,400+ studies using effect sizes. Findings like d = 0.73 for feedback and d = 0.52 for cooperative learning guide evidence-based teaching practice.

📊

A/B Testing

Product and marketing teams report effect sizes (often Cohen's d or relative risk) to prioritize which experiments to ship. An A/B test with p = 0.04 but d = 0.02 rarely justifies a full rollout.

Sports Science

Performance researchers use magnitude-based inference anchored to effect size, not just p-values. A d = 0.20 improvement in sprint time can meaningfully separate athletes at elite levels.

🔬

Meta-Analysis

Meta-analysts combine effect sizes from dozens of studies to estimate the overall effect of an intervention. Without a standardized effect size, studies measuring outcomes in different units cannot be meaningfully pooled.

John Hattie's Effect Size in Education

John Hattie's Visible Learning project, now spanning over 1,800 meta-analyses and 300 million students, is the largest synthesis of educational research ever conducted. Hattie uses Cohen's d as the universal currency for comparing educational interventions.

Hattie Effect Size Chart — Key Findings

What works best in education?

Hattie's "hinge point" is d = 0.40 — the average effect of schooling itself. Interventions above this threshold are considered worth adopting; those below are likely no better than standard teaching. The findings challenge many conventional assumptions.

Educational InterventionHattie's Effect Size (d)Rank (approx.)
Collective teacher efficacy1.57Top 5
Self-reported grades (student expectations)1.33Top 5
Formative evaluation / feedback0.73High
Direct instruction0.60Above average
Cooperative learning0.52Above average
Problem-based learning0.35Below hinge point
Class size reduction0.21Small effect
Homework (secondary)0.29Small-medium

Hattie's work illustrates both the power and the limitations of effect size benchmarks. His classification uses d = 0.40 as "the hinge point" — meaning interventions with d < 0.40 may not justify their cost — which differs from Cohen's original small/medium/large framework. The right benchmark depends on the question you're asking.

Source: Hattie, J. (2009). Visible Learning: A Synthesis of Over 800 Meta-Analyses Relating to Achievement. Routledge. | Updated data available at Visible Learning MetaX.

Effect Size Symbols and Notation

Each effect size measure uses a specific symbol. Knowing the correct notation matters for reading journal articles and writing up results correctly.

SymbolNameUsed ForRange
dCohen's dTwo-group mean difference−∞ to +∞ (absolute value for magnitude)
gHedges' gBias-corrected mean differenceSame as d
ΔGlass's deltaMean diff using control SDSame as d
rPearson's rCorrelation / regression−1 to +1
Coefficient of determinationVariance explained (regression)0 to 1
η²Eta squaredANOVA variance explained0 to 1
ω²Omega squaredANOVA, less biased than η²0 to 1
φPhi coefficientChi-square 2×2 table0 to 1
VCramer's VChi-square larger tables0 to 1
fCohen's fANOVA, related to η²0 to +∞

Frequently Asked Questions About Effect Size

What is effect size in statistics?

Effect size is a standardized numerical measure of the magnitude or practical importance of a statistical result. It answers "how large is this effect?" rather than the yes/no question of statistical significance. Common measures include Cohen's d for mean differences and Eta squared for ANOVA results. Effect size is independent of sample size, making it a more stable indicator of practical importance than the p-value.

What is a good effect size?

Cohen's (1988) benchmarks define small = 0.20, medium = 0.50, and large = 0.80 for Cohen's d. However, "good" is context-dependent. In education, Hattie's work shows the average intervention produces d = 0.40, so that threshold is more meaningful for comparing teaching methods. In clinical medicine, a d = 0.20 may be highly clinically significant if the outcome is mortality. Always compare to published effect sizes in your specific field.

What does a small effect size mean?

A small effect size (Cohen's d ≈ 0.20) means the two groups' distributions overlap substantially — about 85% overlap. The difference exists but is subtle. In everyday terms, it is roughly the difference in height between 15- and 16-year-old girls in the same population. Small effects can still be practically important: a small reduction in mortality risk, applied to millions of people, has enormous population-level consequences.

What does a large effect size mean?

A large effect size (Cohen's d ≥ 0.80) means the groups are substantially separated — only about 53% distribution overlap. The average person in the higher-scoring group outperforms approximately 79% of people in the lower-scoring group. An example: the difference in IQ between college graduates and non-graduates in the general population is approximately d = 1.0 — a very large effect that is easily observed without statistical testing.

How does effect size differ from statistical significance?

Statistical significance (p-value) measures whether an effect is detectable given your sample size. Effect size measures how large the effect is, independent of sample size. A result can be statistically significant with a tiny effect size (when n is very large), or statistically non-significant with a large effect size (when n is very small). The p-value and effect size answer different questions — responsible research reports both.

Is effect size affected by sample size?

A correctly computed effect size is not directly affected by sample size — that is its primary advantage over the p-value. Whether you study 20 or 2,000 people, if the true population means and standard deviations are the same, Cohen's d should produce the same estimate. However, small samples produce less precise estimates, so confidence intervals around effect sizes are wider when n is small. Hedges' g corrects for a small upward bias that Cohen's d shows in small samples.

Effect size connects to several other core statistical ideas. Understanding the relationships between these concepts deepens your ability to design studies, interpret results, and evaluate published research.

🔍

Hypothesis Testing

Effect size is reported alongside p-values in hypothesis tests. The test determines significance; effect size determines magnitude. Both are needed for a complete result.

📉

P-values

P-values and effect sizes answer different questions. A p-value is affected by sample size; effect size is not. Large n can make trivially small effects statistically significant.

🎯

Confidence Intervals

Confidence intervals around effect sizes provide more information than point estimates alone. A CI for Cohen's d shows the range of plausible true effects given sampling variability.

Statistical Power

Statistical power — the probability of detecting a true effect — is directly tied to effect size. Larger effect sizes are easier to detect; power analysis uses an expected effect size to determine the required sample size.

📊

Pearson Correlation

Pearson's r is both a correlation coefficient and an effect size. Its square (r²) tells you what proportion of variance in one variable is explained by another.

🧮

ANOVA

ANOVA tests whether group means differ significantly. Eta squared and omega squared are the matching effect size measures, quantifying what proportion of total variance the grouping variable explains.

🔗
Continue learning at Statistics Fundamentals

This guide is part of the Statistics Fundamentals learning library. Explore related topics: hypothesis testing examples, confidence intervals for means, Type I and Type II errors, null and alternative hypotheses, and our full statistics calculators library.