How do you calculate effect size?

For two-group comparisons, use Cohen's d = (M1 − M2) / SD_pooled. For ANOVA, use Eta squared = SS_effect / SS_total. For correlations, Pearson's r itself is the effect size. Select the measure that matches your study design.

What is the difference between effect size and p-value?

A p-value answers 'Is this effect real?' — it measures the probability of observing your data if there were no true effect. Effect size answers 'How big is this effect?' — it measures magnitude independent of sample size. Large samples can produce tiny p-values for trivially small effects; effect size prevents this misinterpretation.

Effect Size: The Complete Statistical Guide & Formula Reference (2026)

Q: What is effect size in statistics?

Effect size is a numerical measure of the magnitude or practical importance of a statistical result. Unlike a p-value, which only tells you whether an effect exists, effect size tells you how large that effect is. Common measures include Cohen's d for mean differences and Eta squared for ANOVA.

Q: What is a good effect size?

According to Cohen's (1988) benchmarks: a small effect size is Cohen's d = 0.20, a medium effect size is d = 0.50, and a large effect size is d = 0.80. However, what counts as 'good' depends on the research context — educational interventions routinely produce d = 0.40, while some medical treatments are clinically meaningful at d = 0.20.

Q: Is effect size affected by sample size?

No — a correctly computed effect size is not directly affected by sample size. That is its main advantage over the p-value. However, small samples produce less precise estimates of effect size, so confidence intervals around effect sizes are wider with smaller n.

What Is Effect Size? (Definition)

Definition — Effect Size

Effect size is a standardized, quantitative measure of the magnitude of a statistical result — how large, strong, or practically important an observed relationship or difference is. It answers the question "How much?" rather than the yes/no question answered by a p-value.

Effect Size = magnitude of an effect, independent of sample size

When two groups are compared — say, a treatment group and a control group — a p-value tells you whether the difference between them is statistically distinguishable from zero. Effect size tells you how large that difference is in standardized units. A study with n = 10,000 can produce p = 0.001 for a difference so small it has no practical meaning. Effect size catches that.

The American Psychological Association (APA), the American Statistical Association (ASA), and most major journals now require reporting effect sizes alongside p-values. Jacob Cohen, who formalized many of the measures used today, argued in his landmark 1988 textbook Statistical Power Analysis for the Behavioral Sciences that effect size is the most fundamental quantity in empirical research. His three-level classification — small, medium, large — remains the dominant interpretive framework across psychology, education, and medicine.

1988

Cohen formalizes effect size benchmarks

d = 0.50

Medium effect (Cohen's d benchmark)

η² = 0.06

Medium effect (ANOVA benchmark)

r = 0.30

Medium correlation effect size

⚡ Quick Reference — Effect Size Key Facts

Effect size meaning: Quantifies how large or practically important a result is, beyond statistical significance
Not affected by sample size: Unlike the p-value, effect size is a property of the population, not of n
Required by APA (2010): The APA Publication Manual mandates reporting effect sizes in all empirical research
Cohen's benchmarks: Small = 0.20, Medium = 0.50, Large = 0.80 (for Cohen's d)
Standardized: Effect sizes are unit-free, so they can be compared across studies and disciplines
Meta-analysis: Effect sizes are the raw material of meta-analysis — they allow combining evidence across studies

Effect Size vs P-Value: Why Magnitude Matters

Statistical significance and practical significance are different things. A p-value answers one question: given the sample size, could this result have occurred by chance if there were no true effect? Effect size answers a completely separate question: how large is the effect?

⚠️

The sample size problem with p-values

With n = 100,000, even a difference of 0.001 IQ points can produce p < 0.05. That difference is real — but meaningless. Effect size prevents this misinterpretation by measuring magnitude independently of sample size.

Concept	P-value	Effect Size
Question answered	Is the effect real?	How large is the effect?
Affected by sample size	Yes — larger n → smaller p	No — independent of n
Tells you practical importance	No	Yes
Required for meta-analysis	No	Yes
APA-required reporting	Yes	Yes
Measures significance	Statistical significance	Practical significance

The two measures are not interchangeable — they work together. A result can be statistically significant with a tiny effect size (large sample, negligible difference), or statistically non-significant with a large effect size (small sample, real-but-undetected effect). Good research reports and interprets both.

Reference: Cohen, J. (1988). Statistical Power Analysis for the Behavioral Sciences (2nd ed.). Lawrence Erlbaum Associates. | Lakens, D. (2013). Calculating and reporting effect sizes to facilitate cumulative science. Frontiers in Psychology, 4, 863.

The Complete Effect Size Formula Library

Different study designs require different effect size measures. The table below maps each design to its recommended measure. Detailed formulas follow for each.

Study Design	Recommended Measure	Symbol
Two independent groups (t-test)	Cohen's d or Hedges' g	d, g
Two groups, small samples (n < 20)	Hedges' g (bias-corrected)	g
Control group SD differs from treatment	Glass's delta	Δ
ANOVA (variance explained)	Eta squared or Omega squared	η², ω²
ANOVA (population estimate)	Omega squared (preferred)	ω²
Correlation / regression	Pearson's r or r²	r, r²
Chi-square (2×2 table)	Phi coefficient	φ
Chi-square (larger tables)	Cramer's V	V

Cohen's d — Standardized Mean Difference

Cohen's d is the most widely used effect size measure. It expresses the difference between two group means in units of the pooled standard deviation. The result is unit-free, allowing comparisons across studies measuring different things.

Cohen's d Formula

d = (M₁ − M₂) / SD_pooled

M₁ = mean of Group 1 M₂ = mean of Group 2 SD_pooled = √[(SD₁² + SD₂²) / 2]

The pooled standard deviation assumes the two groups have roughly equal variance. If standard deviations differ substantially, consider Glass's delta instead. The sign of d tells you the direction of the effect (which group scored higher); interpretation tables use the absolute value.

Hedges' g — Bias-Corrected Estimate

Hedges' g applies a correction factor to Cohen's d for small sample sizes. When n₁ + n₂ is below about 20, Cohen's d overestimates the true population effect; Hedges' g corrects for this bias.

Hedges' g Formula

g = d × (1 − 3 / (4(n₁ + n₂) − 9))

d = Cohen's d n₁, n₂ = group sample sizes

Hedges' g is interpreted using the same benchmarks as Cohen's d. For large samples the two measures converge; the difference only matters when total n is below 50.

Glass's Delta — Control Group Reference

Glass's delta uses only the control group's standard deviation in the denominator. It is the preferred measure when the experimental treatment is expected to change within-group variability — for example, in clinical trials where the intervention affects not just the mean but also consistency of response.

Glass's Delta Formula

Δ = (M_treatment − M_control) / SD_control

SD_control = standard deviation of control group only

Eta Squared (η²) — ANOVA Variance Explained

Eta squared quantifies the proportion of total variance in the dependent variable that is explained by the independent variable in an ANOVA. It ranges from 0 to 1 and can be interpreted like an R² from regression.

Eta Squared Formula (ANOVA)

η² = SS_effect / SS_total

SS_effect = sum of squares for the effect SS_total = total sum of squares

Eta squared tends to overestimate the population effect in small samples because it is computed from sample sums of squares with no bias correction. For that reason, omega squared is preferred when generalizing beyond the sample.

Omega Squared (ω²) — Less Biased ANOVA Estimate

Omega squared corrects for the upward bias in eta squared, producing a more accurate estimate of the proportion of variance explained in the population. The formula adjusts for degrees of freedom and mean square error.

Omega Squared Formula

ω² = (SS_effect − df_effect × MS_error) / (SS_total + MS_error)

df_effect = degrees of freedom for the factor MS_error = mean square error

Pearson's r — Correlation Effect Size

When your research involves a correlation or regression rather than a group comparison, Pearson's r is the effect size. It ranges from −1 to +1, with larger absolute values indicating stronger effects. Squaring r gives r², the proportion of variance shared between the two variables.

Pearson's r Formula

r = Σ[(xᵢ − x̄)(yᵢ − ȳ)] / √[Σ(xᵢ − x̄)² · Σ(yᵢ − ȳ)²]

x̄, ȳ = means of variables X and Y r² = variance explained

Source: Cohen, J. (1988). Statistical Power Analysis for the Behavioral Sciences, 2nd ed. | Fritz, C. O., Morris, P. E., & Richler, J. J. (2012). Effect size estimates: Current use, calculations, and interpretation. Journal of Experimental Psychology: General, 141(1), 2–18.

Effect Size Interpretation Tables

Cohen's 1988 benchmarks remain the standard reference across disciplines. They were calibrated on research in psychology and behavioral science. In fields like medicine and educational research, smaller effects are often clinically meaningful — so always interpret effect sizes in context, not just against these thresholds.

Cohen's d Interpretation

Cohen's d	Interpretation	Overlap (%)	Example Context
< 0.20	Negligible	~92%	Barely detectable difference
0.20	Small Effect	~85%	Height difference: males vs. females in same sample
0.50	Medium Effect	~67%	Difference between IQ scores of groups in different jobs
0.80	Large Effect	~53%	Difference between IQ of college vs. non-college students
≥ 1.20	Very Large Effect	< 45%	Effect of a highly effective educational intervention

Pearson's r Interpretation

Pearson's r (absolute value)	Interpretation	Variance Explained (r²)
0.10	Small Effect	1%
0.30	Medium Effect	9%
0.50	Large Effect	25%
≥ 0.70	Very Large Effect	≥ 49%

Eta Squared and Omega Squared (ANOVA)

η² or ω²	Interpretation	Equivalent Cohen's f
0.01	Small Effect	f = 0.10
0.06	Medium Effect	f = 0.25
0.14	Large Effect	f = 0.40

💡

Context changes what counts as "large"

John Hattie's landmark educational meta-analysis (Visible Learning, 2009) found that the average effect of schooling on student achievement is d = 0.40 — what Cohen called "medium." In that context, an intervention with d = 0.40 is merely average, not impressive. Always compare effect sizes to those of similar interventions in your field.

How to Calculate Effect Size (Step-by-Step)

Identify Your Study Design

Are you comparing two group means (use Cohen's d), analyzing variance across multiple groups (use η² or ω²), or examining a correlation (use Pearson's r)? The design determines the formula.

Gather the Required Statistics

For Cohen's d: group means (M₁, M₂), standard deviations (SD₁, SD₂), and sample sizes (n₁, n₂). For ANOVA: the ANOVA summary table with SS and MS values. For Pearson's r: the raw data or covariance and standard deviations.

Compute the Pooled Standard Deviation (for d)

SD_pooled = √[(SD₁² + SD₂²) / 2] when group sizes are equal. When n₁ ≠ n₂, use the weighted formula: SD_pooled = √[((n₁ − 1)SD₁² + (n₂ − 1)SD₂²) / (n₁ + n₂ − 2)].

Apply the Formula

Divide the mean difference by the pooled SD for Cohen's d, or compute SS_effect/SS_total for eta squared. Use the calculator below to verify your arithmetic.

Apply Hedges' Correction if Needed

If your combined sample size is below 50, multiply Cohen's d by the correction factor: (1 − 3/(4(n₁ + n₂) − 9)) to obtain Hedges' g. For larger samples, the correction is negligible.

Interpret in Context and Report

Compare to Cohen's benchmarks and to typical effect sizes in your field. Report as: "Cohen's d = 0.54, indicating a medium effect" or "η² = 0.09, indicating that the independent variable explained 9% of variance in the outcome."

Interactive Effect Size Calculator (Cohen's d & Hedges' g)

Enter the summary statistics for two groups. The calculator computes Cohen's d, Hedges' g (bias-corrected), the pooled standard deviation, and automatically classifies the magnitude based on Cohen's benchmarks.

Effect Size Calculator — Cohen's d & Hedges' g

Enter group means, standard deviations, and sample sizes below.

Group 1 (Experimental / Treatment)

Mean (M₁)

SD (σ₁)

Sample Size (n₁)

Group 2 (Control / Comparison)

Mean (M₂)

SD (σ₂)

Sample Size (n₂)

—

Cohen's d

—

Hedges' g (bias-corrected)

Worked Examples Across Research Designs

Example 1 — Two-Group Comparison (Cohen's d)

Worked Example 1 — Cohen's d

Problem: Researchers test whether a memory training program improves recall scores. The training group (n₁ = 25) scores M₁ = 78 with SD₁ = 10. The control group (n₂ = 25) scores M₂ = 70 with SD₂ = 12. Calculate Cohen's d and Hedges' g.

Compute the pooled SD: SD_pooled = √[(10² + 12²) / 2] = √[(100 + 144) / 2] = √122 = 11.05

Calculate Cohen's d: d = (78 − 70) / 11.05 = 8 / 11.05 = 0.724

Apply Hedges' correction: Correction = 1 − 3/(4(25+25) − 9) = 1 − 3/191 = 0.9843
g = 0.724 × 0.9843 = 0.713

Interpret: d = 0.724 falls between 0.50 (medium) and 0.80 (large). By convention, this is a medium-to-large effect.

✅ Result: Cohen's d = 0.72, Hedges' g = 0.71. The memory training produced a medium-to-large effect on recall scores. The training group scored about 0.72 pooled standard deviations higher than the control group.

Example 2 — One-Way ANOVA (Eta Squared)

Worked Example 2 — Eta Squared

Problem: A study compares exam performance across three teaching methods (lecture, flipped classroom, online). The ANOVA table shows SS_between = 450 and SS_total = 1,800. Calculate η² and ω² (with MS_error = 75, df_between = 2).

Calculate Eta squared: η² = SS_effect / SS_total = 450 / 1,800 = 0.25

Calculate Omega squared:
ω² = (450 − 2 × 75) / (1,800 + 75) = (450 − 150) / 1,875 = 300 / 1,875 = 0.16

Interpret: η² = 0.25 far exceeds the large threshold of 0.14. ω² = 0.16, the less biased estimate, still indicates a large effect.

✅ Result: η² = 0.25, ω² = 0.16. Teaching method explains approximately 16–25% of the variance in exam scores — a large effect. The ω² = 0.16 is the preferred report value as it corrects for sample bias.

Example 3 — Pearson's r (Correlation Effect Size)

Worked Example 3 — Pearson's r

Problem: A study finds r = −0.42 between hours of sleep and number of errors on a cognitive task. What is the effect size and how much variance is explained?

Effect size: |r| = 0.42 falls between the medium threshold (0.30) and large threshold (0.50).

Variance explained: r² = 0.42² = 0.176. Sleep explains about 17.6% of the variance in cognitive errors.

✅ Result: r = −0.42 indicates a medium-to-large negative correlation. More sleep is associated with fewer errors. The relationship accounts for approximately 18% of variance in errors — practically meaningful in a cognitive health context.

Example 4 — Clinical Trial (Cohen's d in Medicine)

Worked Example 4 — Clinical Effect Size

Problem: A blood pressure drug trial finds the treatment group has a mean reduction of 12 mmHg (SD = 15), while the placebo group shows 5 mmHg (SD = 14). n = 200 per group. The p-value is 0.0003. How large is the effect?

Pooled SD: √[(15² + 14²)/2] = √[(225 + 196)/2] = √210.5 = 14.51

Cohen's d: d = (12 − 5) / 14.51 = 7 / 14.51 = 0.48

Context: In clinical cardiology, a mean difference of 7 mmHg in systolic BP is considered clinically meaningful, even though d = 0.48 is technically a "medium" effect by Cohen's benchmarks. This illustrates why domain context matters.

✅ Result: Cohen's d = 0.48 (medium effect). The drug is both statistically significant (p = 0.0003) and clinically meaningful (7 mmHg reduction). Reporting effect size alongside p-value provides the complete picture for clinical decision-making.

Visualizing Effect Size Magnitude

One of the most intuitive ways to grasp what a Cohen's d value means is to think about the overlap between two distributions. A d = 0 means 100% overlap — the groups are identical. As d grows, the distributions separate and overlap decreases.

Distribution Overlap by Effect Size

Effect Size

Group 1 (purple) vs Group 2 (pink)

Overlap

d = 0.20
Small

~85%

d = 0.50
Medium

~67%

d = 0.80
Large

~53%

d = 1.20
Very Large

~40%

Bars represent approximate distribution spread. At d = 0.80, the average person in Group 1 scores above 79% of people in Group 2.

Real-World Applications of Effect Size

🏥

Clinical Research

Drug trials report effect sizes to distinguish statistical significance (driven by large n) from clinical significance. A d = 0.20 may be trivially small for pain reduction but clinically important for mortality risk.

🧠

Psychology

The replication crisis prompted psychology to mandate effect size reporting. Many classic effects (ego depletion, social priming) shrank dramatically when replication studies computed more accurate effect sizes.

📚

Education

John Hattie's Visible Learning meta-analysis synthesized 1,400+ studies using effect sizes. Findings like d = 0.73 for feedback and d = 0.52 for cooperative learning guide evidence-based teaching practice.

📊

A/B Testing

Product and marketing teams report effect sizes (often Cohen's d or relative risk) to prioritize which experiments to ship. An A/B test with p = 0.04 but d = 0.02 rarely justifies a full rollout.

⚽

Sports Science

Performance researchers use magnitude-based inference anchored to effect size, not just p-values. A d = 0.20 improvement in sprint time can meaningfully separate athletes at elite levels.

🔬

Meta-Analysis

Meta-analysts combine effect sizes from dozens of studies to estimate the overall effect of an intervention. Without a standardized effect size, studies measuring outcomes in different units cannot be meaningfully pooled.

John Hattie's Effect Size in Education

John Hattie's Visible Learning project, now spanning over 1,800 meta-analyses and 300 million students, is the largest synthesis of educational research ever conducted. Hattie uses Cohen's d as the universal currency for comparing educational interventions.

Hattie Effect Size Chart — Key Findings

What works best in education?

Hattie's "hinge point" is d = 0.40 — the average effect of schooling itself. Interventions above this threshold are considered worth adopting; those below are likely no better than standard teaching. The findings challenge many conventional assumptions.

Educational Intervention	Hattie's Effect Size (d)	Rank (approx.)
Collective teacher efficacy	1.57	Top 5
Self-reported grades (student expectations)	1.33	Top 5
Formative evaluation / feedback	0.73	High
Direct instruction	0.60	Above average
Cooperative learning	0.52	Above average
Problem-based learning	0.35	Below hinge point
Class size reduction	0.21	Small effect
Homework (secondary)	0.29	Small-medium

Hattie's work illustrates both the power and the limitations of effect size benchmarks. His classification uses d = 0.40 as "the hinge point" — meaning interventions with d < 0.40 may not justify their cost — which differs from Cohen's original small/medium/large framework. The right benchmark depends on the question you're asking.

Source: Hattie, J. (2009). Visible Learning: A Synthesis of Over 800 Meta-Analyses Relating to Achievement. Routledge. | Updated data available at Visible Learning MetaX.

Effect Size Symbols and Notation

Each effect size measure uses a specific symbol. Knowing the correct notation matters for reading journal articles and writing up results correctly.

Symbol	Name	Used For	Range
d	Cohen's d	Two-group mean difference	−∞ to +∞ (absolute value for magnitude)
g	Hedges' g	Bias-corrected mean difference	Same as d
Δ	Glass's delta	Mean diff using control SD	Same as d
r	Pearson's r	Correlation / regression	−1 to +1
r²	Coefficient of determination	Variance explained (regression)	0 to 1
η²	Eta squared	ANOVA variance explained	0 to 1
ω²	Omega squared	ANOVA, less biased than η²	0 to 1
φ	Phi coefficient	Chi-square 2×2 table	0 to 1
V	Cramer's V	Chi-square larger tables	0 to 1
f	Cohen's f	ANOVA, related to η²	0 to +∞

Frequently Asked Questions About Effect Size

What is effect size in statistics?

Effect size is a standardized numerical measure of the magnitude or practical importance of a statistical result. It answers "how large is this effect?" rather than the yes/no question of statistical significance. Common measures include Cohen's d for mean differences and Eta squared for ANOVA results. Effect size is independent of sample size, making it a more stable indicator of practical importance than the p-value.

What is a good effect size?

Cohen's (1988) benchmarks define small = 0.20, medium = 0.50, and large = 0.80 for Cohen's d. However, "good" is context-dependent. In education, Hattie's work shows the average intervention produces d = 0.40, so that threshold is more meaningful for comparing teaching methods. In clinical medicine, a d = 0.20 may be highly clinically significant if the outcome is mortality. Always compare to published effect sizes in your specific field.

What does a small effect size mean?

A small effect size (Cohen's d ≈ 0.20) means the two groups' distributions overlap substantially — about 85% overlap. The difference exists but is subtle. In everyday terms, it is roughly the difference in height between 15- and 16-year-old girls in the same population. Small effects can still be practically important: a small reduction in mortality risk, applied to millions of people, has enormous population-level consequences.

What does a large effect size mean?

A large effect size (Cohen's d ≥ 0.80) means the groups are substantially separated — only about 53% distribution overlap. The average person in the higher-scoring group outperforms approximately 79% of people in the lower-scoring group. An example: the difference in IQ between college graduates and non-graduates in the general population is approximately d = 1.0 — a very large effect that is easily observed without statistical testing.

How does effect size differ from statistical significance?

Statistical significance (p-value) measures whether an effect is detectable given your sample size. Effect size measures how large the effect is, independent of sample size. A result can be statistically significant with a tiny effect size (when n is very large), or statistically non-significant with a large effect size (when n is very small). The p-value and effect size answer different questions — responsible research reports both.

Is effect size affected by sample size?

A correctly computed effect size is not directly affected by sample size — that is its primary advantage over the p-value. Whether you study 20 or 2,000 people, if the true population means and standard deviations are the same, Cohen's d should produce the same estimate. However, small samples produce less precise estimates, so confidence intervals around effect sizes are wider when n is small. Hedges' g corrects for a small upward bias that Cohen's d shows in small samples.

Effect size connects to several other core statistical ideas. Understanding the relationships between these concepts deepens your ability to design studies, interpret results, and evaluate published research.

🔍

Continue learning at Statistics Fundamentals

This guide is part of the Statistics Fundamentals learning library. Explore related topics: hypothesis testing examples, confidence intervals for means, Type I and Type II errors, null and alternative hypotheses, and our full statistics calculators library.