Effect Size Hypothesis Testing Research Methods 22 min read June 9, 2026
BY: Statistics Fundamentals Team
Reviewed By: Minsa A (Senior Statistics Editor)

Cohen's d Explained: Effect Size, Formula, and Interpretation

A drug trial shows p < 0.05. A training program reports significant improvement. Both are "statistically significant" — but neither number tells you whether the effect was big enough to care about. That gap is exactly what Cohen's d fills. It converts a raw mean difference into a single number expressed in standard deviation units, so you can judge magnitude directly.

This guide covers Cohen's d from scratch: what it is, how to calculate it step by step, how to interpret the result in context, and how it compares to related metrics like Hedges' g and Glass's delta. An interactive calculator lets you compute it for your own data.

What You'll Learn
  • ✓ What Cohen's d measures and why it matters beyond the p-value
  • ✓ The independent-samples and paired-samples formulas, fully explained
  • ✓ How to calculate pooled standard deviation correctly
  • ✓ Worked examples from psychology, medicine, education, and A/B testing
  • ✓ The small/medium/large benchmarks — and when to ignore them
  • ✓ Cohen's d vs Hedges' g vs Glass's delta vs the p-value
  • ✓ An interactive Cohen's d calculator with interpretation output

What Is Cohen's d? (Definition)

Definition — Cohen's d
Cohen's d is a standardized effect size that expresses the difference between two group means in units of the pooled standard deviation. It answers the question: "How many standard deviations apart are these two groups?" A value of 1.0 means the groups differ by exactly one standard deviation; a value of 0.5 means half a standard deviation.
d = (M₁ − M₂) / SD_pooled

The statistic was introduced by psychologist Jacob Cohen in his 1988 textbook Statistical Power Analysis for the Behavioral Sciences, where he argued that researchers needed a way to communicate effect magnitude separately from statistical significance. It became the most widely reported effect size in the behavioral sciences and is now standard across medicine, education, and data science.

Understanding Cohen's d requires grasping one key concept first: raw mean differences are hard to compare across studies because they depend on the measurement scale. A mean difference of 10 points means something very different on a 100-point IQ scale versus a 20-point pain scale. Dividing by the standard deviation removes the units, making the number interpretable on its own.

⚡ Quick Reference — Cohen's d Key Facts
  • Formula: d = (M₁ − M₂) / SD_pooled
  • Benchmarks: 0.2 = small, 0.5 = medium, 0.8 = large (Cohen, 1988)
  • Range: Technically unbounded; values above 2 are rare in social science
  • Sign: Negative just means Group 2 scored higher; always interpret magnitude
  • Independence: Unlike p-values, Cohen's d does not change with sample size
  • Assumption: Designed for approximately normal distributions with similar spreads

Why Effect Size Matters Beyond the P-Value

A p-value answers one narrow question: "If the null hypothesis were true, how often would data this extreme arise by chance?" It says nothing about the size of the difference. With enough participants, any trivial difference eventually becomes statistically significant.

Consider two studies on the same intervention. Study A tests 40 participants and finds d = 0.80, p = 0.04. Study B tests 10,000 participants and finds d = 0.03, p = 0.001. Study B is more statistically significant — but the effect is so small it almost certainly has no practical value. Cohen's d surfaces that distinction immediately.

d = 0.2
Small effect
d = 0.5
Medium effect
d = 0.8
Large effect
d = 1.0+
Very large
💡
APA Reporting Standard

The American Psychological Association's Publication Manual now requires that researchers report effect sizes alongside p-values for all primary analyses. Cohen's d is the recommended measure when comparing two group means on a continuous outcome variable.

The Core Idea: Standardized Difference

Think of Cohen's d as converting two overlapping distributions into a single number that captures how far apart their centers are, relative to how spread out they are. When d = 0, the distributions are perfectly centered on the same value. When d = 1.0, the means are separated by one full standard deviation — roughly 84% of one group falls below the mean of the other.

The standardization step is what makes comparison across studies possible. A meta-analysis can pool a pain study measured in centimeters on a visual analog scale with a mood study measured on a 7-point Likert scale, because both produce Cohen's d values on the same dimensionless scale.

Cohen's d Formula Reference

Independent Samples Formula

Use this when the two groups are separate — for example, a treatment group and a control group with different participants in each.

Cohen's d — Independent Samples
d = (M₁ − M₂) / SD_pooled
M₁ = mean of Group 1
M₂ = mean of Group 2
SD_pooled = pooled standard deviation

Pooled Standard Deviation

The pooled standard deviation combines the variability from both groups into one estimate. It gives more weight to whichever group has more observations.

Pooled Standard Deviation
SD_pooled = √[ ((n₁−1)·SD₁² + (n₂−1)·SD₂²) / (n₁+n₂−2) ]
n₁, n₂ = sample sizes
SD₁, SD₂ = standard deviations
n₁+n₂−2 = degrees of freedom
⚠️
Equal vs. Unequal Sample Sizes

When both groups are the same size (n₁ = n₂), the pooled SD simplifies to the average of the two SDs: SD_pooled = √((SD₁² + SD₂²) / 2). The full weighted formula is needed only when sample sizes differ. Using the simplified version with unequal groups produces a biased estimate.

Paired Samples Formula

Use this when the same participants are measured twice — before and after an intervention, or under two different conditions.

Cohen's d — Paired Samples (Within-Subjects)
d = M_diff / SD_diff
M_diff = mean of difference scores
SD_diff = standard deviation of difference scores

Paired Cohen's d tends to be larger than the independent-samples version for the same data, because within-subject designs reduce the variance by controlling for individual differences. Be explicit in any report about which formula you used.

Variable Definitions Table

SymbolMeaningNotes
dCohen's d effect sizeDimensionless; units cancel out
M₁, M₂Sample means of Group 1 and Group 2Any continuous outcome variable
SD₁, SD₂Standard deviations of each groupUse sample SD (denominator n−1)
n₁, n₂Sample sizesCan differ between groups
SD_pooledWeighted combined standard deviationWeights by degrees of freedom
M_diffMean of difference scores (paired)Each participant's post minus pre
SD_diffSD of difference scores (paired)Not the same as SD_pooled

How to Calculate Cohen's d (Step-by-Step)

Calculating Cohen's d by hand takes five steps. The process is straightforward once you have the group means and standard deviations.

Step-by-Step Walkthrough

Calculation Guide

Generic Five-Step Process

1

Identify group means. Record M₁ and M₂. The order matters for the sign of d but not its magnitude. By convention, put the treatment or intervention group first.

2

Record the standard deviations. Use the sample standard deviation (divide by n−1, not n). If your software reports the population SD, convert it: multiply by √(n/(n−1)).

3

Calculate the pooled standard deviation. Use SD_pooled = √[((n₁−1)·SD₁² + (n₂−1)·SD₂²) / (n₁+n₂−2)]. If n₁ = n₂, you can use the simplified form √((SD₁² + SD₂²) / 2).

4

Apply the formula. Divide the mean difference (M₁ − M₂) by SD_pooled. The result is Cohen's d.

5

Interpret the result. Compare d against the small/medium/large benchmarks, but also consider what effect size is meaningful in your specific domain. Report both the value and its interpretation.

Common Calculation Mistakes

Mistake #1

Using the Wrong SD Type

Using the population standard deviation (÷ n) instead of the sample standard deviation (÷ n−1) inflates the denominator and shrinks Cohen's d. Most software reports the sample SD by default, but always verify.

Mistake #2

Ignoring Unequal Sample Sizes

The simplified pooled SD formula (average of two SDs) only works when n₁ = n₂. With unequal groups, skipping the weighted formula introduces bias — especially when one group is much larger than the other.

Mistake #3

Applying Paired Formula to Independent Data

The paired formula (M_diff / SD_diff) requires the same participants in both conditions. Applying it to two separate groups produces a misleadingly large d because within-subject variance is much smaller.

Mistake #4

Treating Benchmarks as Universal

Cohen's 0.2 / 0.5 / 0.8 thresholds were designed for psychology research. A d of 0.2 can be clinically large in medicine (e.g., mortality reduction), while a d of 0.8 may be trivial in a high-noise physical measurement.

Worked Examples

Example 1 — Psychology: Cognitive Training Study

Psychology Example

Does a memory training program improve recall scores?

A researcher tests two groups of 30 participants each. The training group completes six weeks of structured memory exercises; the control group does no training. Both groups complete the same recall test (maximum score 100) at the end of the study.

GroupnMean (M)Standard Deviation (SD)
Training307212
Control306011
1

Mean difference: M₁ − M₂ = 72 − 60 = 12 points

2

Pooled SD (equal n, simplified): SD_pooled = √((12² + 11²) / 2) = √((144 + 121) / 2) = √(132.5) ≈ 11.51

3

Cohen's d: d = 12 / 11.51 ≈ 1.04

✓ d = 1.04 — a large effect. The training group scored, on average, about one full standard deviation higher than the control group. This is a substantial difference that would be practically meaningful in an educational context.

Example 2 — Medicine: Blood Pressure Drug Trial

Medical Example

Does a new antihypertensive drug reduce systolic blood pressure?

A clinical trial assigns 50 patients to a new drug and 40 patients to a placebo. Systolic blood pressure (mmHg) is measured after 12 weeks.

GroupnMean (M)SD
Drug5012814
Placebo4014216
1

Mean difference: 142 − 128 = 14 mmHg (drug group is lower)

2

Pooled SD (unequal n, full formula): SD_pooled = √[((50−1)·14² + (40−1)·16²) / (50+40−2)] = √[(49·196 + 39·256) / 88] = √[(9604 + 9984) / 88] = √[19588 / 88] = √222.6 ≈ 14.92

3

Cohen's d: d = 14 / 14.92 ≈ 0.94

✓ d = 0.94 — a large effect by Cohen's benchmarks. In clinical terms, a 14 mmHg reduction in systolic pressure is also clinically significant, corroborating the effect size finding. Both the magnitude and the clinical context point to a meaningful intervention.

Example 3 — Education: Teaching Method Comparison

Education Example

Do students taught with active learning methods outscore those in traditional lecture courses?

Two sections of an introductory statistics course use different instructional formats. Scores are measured on a standardized final exam (0–100).

SectionnMeanSD
Active Learning35789
Traditional Lecture357310
1

Mean difference: 78 − 73 = 5 points

2

Pooled SD: √((9² + 10²) / 2) = √((81 + 100) / 2) = √90.5 ≈ 9.51

3

Cohen's d: 5 / 9.51 ≈ 0.53

✓ d = 0.53 — a medium effect. In education research, where interventions rarely exceed d = 0.4, this result would be considered a meaningful pedagogical advantage for the active learning approach.

Example 4 — A/B Testing: Website Checkout Conversion

UX / Product Example

Does a redesigned checkout page increase average order value?

An e-commerce team runs an A/B test. Version A is the existing checkout; Version B adds a one-click upsell. Average order value (USD) is the metric.

VariantnMean Order ($)SD ($)
Version B (upsell)1,20047.8018.50
Version A (control)1,20044.2017.90
1

Mean difference: $47.80 − $44.20 = $3.60

2

Pooled SD: √((18.50² + 17.90²) / 2) = √((342.25 + 320.41) / 2) = √331.33 ≈ 18.20

3

Cohen's d: 3.60 / 18.20 ≈ 0.20

✓ d = 0.20 — a small effect by academic benchmarks, but with n = 1,200 the test has very high power and the $3.60 difference is likely statistically significant. In e-commerce, a small d can represent large revenue at scale. This example illustrates why business context matters as much as the benchmark thresholds.

Cohen's d Interpretation Scale

Effect Size Benchmarks (Cohen, 1988)

0
No effect
0.2
Small
0.5
Medium
0.8
Large
1.2+
Very large
d < 0.2
Negligible
Groups are nearly indistinguishable
d ≈ 0.2
Small
Subtle difference; ~58% overlap
d ≈ 0.5
Medium
Moderate gap; ~69% overlap
d ≈ 0.8
Large
Clear separation; ~79% overlap
d ≥ 1.2
Very large
Groups barely overlap

Understanding Overlap Between Distributions

Another way to interpret Cohen's d is through the overlap of the two group distributions. When d = 0, the groups are perfectly aligned. As d grows, the distributions pull apart and the overlap shrinks. The table below shows the percentage of the combined distribution that the two groups share at each benchmark level.

Cohen's dOverlap (%)InterpretationReal-world analogy
0.0100%No effectTwo identical groups
0.2~85%SmallDifference in height between 15- and 16-year-olds
0.5~67%MediumIQ difference between trained and untrained managers
0.8~53%LargeIQ difference between PhD holders and general population
1.0~45%Very largeHeight difference between men and women (in many populations)
2.0~19%ExceptionalRare in behavioral research; more common in physics or engineering

Context-Dependent Interpretation

Cohen himself acknowledged that his 0.2/0.5/0.8 benchmarks were arbitrary and should serve only as rough guides. Different disciplines have accumulated their own baseline expectations:

FieldTypical RangeNotes
Psychotherapy outcomesd = 0.5–0.8Active treatment vs. waitlist control
Education interventionsd = 0.2–0.4Per Hattie's Visible Learning meta-analysis (threshold = 0.4)
Pharmaceutical trialsd = 0.2–0.5Smaller effects can still be clinically significant
Social psychology lab studiesd = 0.4–1.0Controlled conditions yield larger effects than field studies
A/B testing (digital products)d = 0.1–0.3Small d with large n can still drive business value
Personnel selection (hiring)d = 0.3–0.5Difference between high and average performers

Interactive Cohen's d Calculator

Cohen's d Calculator

Cohen's d Compared to Related Metrics

Cohen's d vs. Hedges' g

Hedges' g applies a correction factor J to reduce the positive bias that Cohen's d has in small samples. The correction is: g = d × J, where J = 1 − (3 / (4·df − 1)) and df = n₁ + n₂ − 2. For samples of 20 or more per group, the difference between d and g is below 5% and generally negligible. For samples under 10 per group, Hedges' g is the better choice — and is the standard in meta-analyses.

📊
Rule of Thumb

Use Cohen's d for individual studies with adequate sample sizes (n ≥ 20 per group). Use Hedges' g when conducting meta-analyses, reporting results from very small samples, or when your software computes it directly and you need to pool studies of differing sizes.

Cohen's d vs. Glass's Delta (Δ)

Glass's delta uses only the control group's standard deviation as the denominator, rather than a pooled estimate: Δ = (M_treatment − M_control) / SD_control. This is appropriate when you have strong reason to believe the treatment changed the variance — for example, if an intervention increases variability in outcomes alongside the mean. When group variances are roughly equal, Glass's delta and Cohen's d produce similar values.

Cohen's d vs. the P-Value

PropertyCohen's dP-Value
What it measuresMagnitude of the effectProbability under the null hypothesis
Affected by sample size?NoYes — grows with n
Tells you practical importance?YesNo
Tells you if the effect is real?NoYes (indirectly)
Can be misleading?Without context, yesWith large n, yes
Recommended useAlongside p-valueAlongside effect size

Cohen's d vs. Pearson's r

Both are standardized effect sizes, but they suit different study designs. Cohen's d compares two group means on a continuous outcome; Pearson's r measures the linear association between two continuous variables. The two can be converted: r = d / √(d² + 4). A Cohen's d of 0.5 corresponds to r ≈ 0.24; d = 0.8 corresponds to r ≈ 0.37. Learn more about correlation and scatter plots here.

Real-World Applications

Psychology Research

Cohen's d is nearly universal in experimental psychology. Studies on cognitive training, therapy outcomes, group differences in test performance, and treatment comparisons all routinely report it. Meta-analyses in clinical psychology — such as those examining the effectiveness of cognitive-behavioral therapy versus waitlist controls — aggregate Cohen's d values from dozens of primary studies to estimate the overall effect. The APA Publication Manual explicitly recommends reporting effect sizes for all primary analyses.

Medical and Clinical Trials

Regulatory agencies and clinical journals increasingly require effect size reporting alongside significance levels. In drug trials comparing a treatment to a placebo on continuous outcomes (blood pressure, biomarkers, quality-of-life scores), Cohen's d gives clinicians an intuitive sense of whether the group difference translates to meaningful patient benefit — something a p-value cannot communicate on its own.

Education Research

Researcher John Hattie's landmark meta-analysis Visible Learning synthesized over 800 meta-analyses covering millions of students and expressed all effects in Cohen's d units. He proposed d = 0.40 as the "hinge point" — the threshold at which an educational intervention is worth implementing. This work made Cohen's d the standard reporting metric across educational psychology worldwide. Explore the foundations of hypothesis testing on Statistics Fundamentals.

A/B Testing in Tech and Product

Product teams at technology companies use Cohen's d to understand whether changes to interfaces, algorithms, or features produce meaningful user behavior differences. With very large sample sizes typical of web experiments, even a d of 0.05 can be statistically significant — making effect size reporting essential for separating business-relevant changes from statistical noise.

Meta-Analysis

Meta-analysis aggregates Cohen's d values across multiple independent studies to estimate the overall effect of an intervention or phenomenon. Because d is dimensionless, studies measuring outcomes on different scales can be meaningfully combined. The resulting pooled d and its confidence interval provide a much more reliable estimate of truth than any single study. See the confidence intervals guide for how uncertainty is quantified around effect size estimates.

Cohen's d Formula Cheat Sheet

MetricFormulaWhen to Use
Cohen's d (independent)d = (M₁−M₂) / SD_pooledTwo separate groups, equal variance
Pooled SD (equal n)√((SD₁²+SD₂²)/2)n₁ = n₂ only
Pooled SD (unequal n)√(((n₁−1)SD₁²+(n₂−1)SD₂²)/(n₁+n₂−2))Any sample sizes
Cohen's d (paired)d = M_diff / SD_diffSame participants, two conditions
Hedges' g correctiong = d × (1 − 3/(4df−1))Small samples or meta-analysis
Glass's deltaΔ = (M₁−M₂) / SD_controlUnequal variances expected
d to r conversionr = d / √(d²+4)Comparing to correlation-based effects

Entity & Terminology Glossary

TermFormula / SymbolDefinition
Cohen's d(M₁−M₂)/SD_pooledStandardized mean difference between two groups; the primary effect size for group comparisons
Effect sizeVariousA quantitative measure of the practical magnitude of a result, independent of sample size
Standard deviation√varianceMeasure of spread around the mean; the denominator in Cohen's d
Pooled SDWeighted combinationA combined estimate of variability that accounts for both groups' sizes and spreads
Mean differenceM₁ − M₂The raw (unstandardized) distance between two group averages
Hedges' gd × correction factorA bias-corrected version of Cohen's d; preferred for small samples and meta-analyses
Glass's delta(M₁−M₂)/SD_controlUses only the control group's SD; appropriate when treatment alters the spread of scores
Statistical power1 − βThe probability of detecting a true effect; increases with larger n and larger d
Confidence intervald ± marginThe range within which the true population effect size likely falls
Null hypothesisH₀: d = 0The default assumption that there is no difference between the two groups
Practical significanceWhether a statistically significant result is large enough to matter in the real world

Common Misconceptions

MisconceptionWhat's WrongCorrect Understanding
"A small Cohen's d means the result is unimportant." Conflates magnitude with importance A d of 0.2 can save lives in medicine or represent millions in revenue at scale
"Statistical significance implies a large effect size." p-value depends heavily on sample size p < 0.001 can accompany d = 0.05 in large-n studies
"Negative Cohen's d means the study failed." Sign reflects labeling, not quality Negative just means Group 2 scored higher; relabel groups if direction matters
"0.2/0.5/0.8 always apply." Context-free application of universal benchmarks Field-specific norms exist; always anchor to domain baselines
"Cohen's d works for any outcome type." Assumes approximately normal distributions For skewed data or ordinal scales, consider rank-based or other effect sizes

Related Topics on Statistics Fundamentals

Cohen's d fits into a broader framework of inferential statistics. These guides on Statistics Fundamentals connect directly to what you've learned here:

Foundation

Hypothesis Testing

Understand the full framework that surrounds Cohen's d — null hypotheses, test statistics, and decision rules.

Test Selection

Statistical Test Selector

Not sure which test to use? The interactive selector guides you to the right test based on your data type and design.

Comparison Tests

Two-Sample T-Test

The t-test produces the p-value that Cohen's d accompanies. Understanding both gives the full picture of a group comparison.

Paired Data

Paired Samples T-Test

When you use the paired Cohen's d formula, the paired-samples t-test is the accompanying significance test.

Spread

Standard Deviation

The denominator of Cohen's d. Understanding standard deviation is essential for computing and interpreting pooled SD.

Uncertainty

Confidence Intervals

Effect sizes should be reported with confidence intervals. A point estimate of d = 0.5 means less without knowing whether its 95% CI is [0.1, 0.9] or [0.4, 0.6].

Academic Sources

  • Cohen, J. (1988). Statistical Power Analysis for the Behavioral Sciences (2nd ed.). Lawrence Erlbaum Associates. — The original text establishing the d formula and 0.2/0.5/0.8 benchmarks.
  • Hedges, L. V., & Olkin, I. (1985). Statistical Methods for Meta-Analysis. Academic Press. — Introduces the bias-corrected g statistic.
  • Hattie, J. (2009). Visible Learning: A Synthesis of Over 800 Meta-Analyses Relating to Achievement. Routledge. — Establishes d = 0.40 as the education benchmark.
  • Lakens, D. (2013). Calculating and reporting effect sizes to facilitate cumulative science. Frontiers in Psychology, 4, 863. — A practical guide to effect size reporting, freely available via doi.org.
  • Fritz, C. O., Morris, P. E., & Richler, J. J. (2012). Effect size estimates: Current use, calculations, and interpretation. Journal of Experimental Psychology: General, 141(1), 2–18.

Frequently Asked Questions

What is Cohen's d used for?

Cohen's d measures the practical magnitude of a difference between two group means. It tells you how large the effect is in standard deviation units — independent of sample size. Researchers use it in psychology, medicine, education, and A/B testing whenever a p-value alone is not enough to judge whether a result matters in practice.

How do you interpret Cohen's d?

Jacob Cohen's benchmarks treat d = 0.2 as a small effect, d = 0.5 as medium, and d = 0.8 as large. These thresholds are domain-dependent: a d of 0.2 can be clinically meaningful in medicine, while a d of 0.8 might be unremarkable in some laboratory settings. Always consider the specific field's norms and the practical stakes of the decision.

Can Cohen's d be negative?

Yes. A negative value simply means the second group mean is larger than the first. The sign reflects which group was placed in the numerator, not whether the result is good or bad. When reporting effect size magnitude, take the absolute value; keep the sign only when the direction of the difference carries theoretical meaning.

Is Cohen's d better than the p-value?

They answer different questions and work best together. The p-value indicates whether an effect is likely real given sampling variability; Cohen's d tells you how large that effect is. A study with a large sample can produce a statistically significant p-value for a Cohen's d of 0.03 — real but practically trivial. Reporting both is the current standard in most peer-reviewed journals.

What is a good effect size for Cohen's d?

"Good" depends entirely on the context. In education, Hattie's research suggests d ≥ 0.4 is needed to justify an intervention. In medicine, d = 0.2 can represent a clinically important difference if the outcome is mortality or serious morbidity. In technology A/B testing, a d as small as 0.1 can be worth implementing at sufficient scale. There is no single threshold for "good."

When should I use Hedges' g instead of Cohen's d?

Use Hedges' g when your sample size is below about 20 per group, or when conducting a meta-analysis that pools studies of varying sizes. For most individual studies with adequate samples, the two values differ by less than 5% and the distinction is rarely consequential in practice.