What Is Cohen's d? (Definition)
The statistic was introduced by psychologist Jacob Cohen in his 1988 textbook Statistical Power Analysis for the Behavioral Sciences, where he argued that researchers needed a way to communicate effect magnitude separately from statistical significance. It became the most widely reported effect size in the behavioral sciences and is now standard across medicine, education, and data science.
Understanding Cohen's d requires grasping one key concept first: raw mean differences are hard to compare across studies because they depend on the measurement scale. A mean difference of 10 points means something very different on a 100-point IQ scale versus a 20-point pain scale. Dividing by the standard deviation removes the units, making the number interpretable on its own.
- Formula: d = (M₁ − M₂) / SD_pooled
- Benchmarks: 0.2 = small, 0.5 = medium, 0.8 = large (Cohen, 1988)
- Range: Technically unbounded; values above 2 are rare in social science
- Sign: Negative just means Group 2 scored higher; always interpret magnitude
- Independence: Unlike p-values, Cohen's d does not change with sample size
- Assumption: Designed for approximately normal distributions with similar spreads
Why Effect Size Matters Beyond the P-Value
A p-value answers one narrow question: "If the null hypothesis were true, how often would data this extreme arise by chance?" It says nothing about the size of the difference. With enough participants, any trivial difference eventually becomes statistically significant.
Consider two studies on the same intervention. Study A tests 40 participants and finds d = 0.80, p = 0.04. Study B tests 10,000 participants and finds d = 0.03, p = 0.001. Study B is more statistically significant — but the effect is so small it almost certainly has no practical value. Cohen's d surfaces that distinction immediately.
The American Psychological Association's Publication Manual now requires that researchers report effect sizes alongside p-values for all primary analyses. Cohen's d is the recommended measure when comparing two group means on a continuous outcome variable.
The Core Idea: Standardized Difference
Think of Cohen's d as converting two overlapping distributions into a single number that captures how far apart their centers are, relative to how spread out they are. When d = 0, the distributions are perfectly centered on the same value. When d = 1.0, the means are separated by one full standard deviation — roughly 84% of one group falls below the mean of the other.
The standardization step is what makes comparison across studies possible. A meta-analysis can pool a pain study measured in centimeters on a visual analog scale with a mood study measured on a 7-point Likert scale, because both produce Cohen's d values on the same dimensionless scale.
Cohen's d Formula Reference
Independent Samples Formula
Use this when the two groups are separate — for example, a treatment group and a control group with different participants in each.
M₁ = mean of Group 1M₂ = mean of Group 2SD_pooled = pooled standard deviationPooled Standard Deviation
The pooled standard deviation combines the variability from both groups into one estimate. It gives more weight to whichever group has more observations.
n₁, n₂ = sample sizesSD₁, SD₂ = standard deviationsn₁+n₂−2 = degrees of freedomWhen both groups are the same size (n₁ = n₂), the pooled SD simplifies to the average of the two SDs: SD_pooled = √((SD₁² + SD₂²) / 2). The full weighted formula is needed only when sample sizes differ. Using the simplified version with unequal groups produces a biased estimate.
Paired Samples Formula
Use this when the same participants are measured twice — before and after an intervention, or under two different conditions.
M_diff = mean of difference scoresSD_diff = standard deviation of difference scoresPaired Cohen's d tends to be larger than the independent-samples version for the same data, because within-subject designs reduce the variance by controlling for individual differences. Be explicit in any report about which formula you used.
Variable Definitions Table
| Symbol | Meaning | Notes |
|---|---|---|
| d | Cohen's d effect size | Dimensionless; units cancel out |
| M₁, M₂ | Sample means of Group 1 and Group 2 | Any continuous outcome variable |
| SD₁, SD₂ | Standard deviations of each group | Use sample SD (denominator n−1) |
| n₁, n₂ | Sample sizes | Can differ between groups |
| SD_pooled | Weighted combined standard deviation | Weights by degrees of freedom |
| M_diff | Mean of difference scores (paired) | Each participant's post minus pre |
| SD_diff | SD of difference scores (paired) | Not the same as SD_pooled |
How to Calculate Cohen's d (Step-by-Step)
Calculating Cohen's d by hand takes five steps. The process is straightforward once you have the group means and standard deviations.
Step-by-Step Walkthrough
Generic Five-Step Process
Identify group means. Record M₁ and M₂. The order matters for the sign of d but not its magnitude. By convention, put the treatment or intervention group first.
Record the standard deviations. Use the sample standard deviation (divide by n−1, not n). If your software reports the population SD, convert it: multiply by √(n/(n−1)).
Calculate the pooled standard deviation. Use SD_pooled = √[((n₁−1)·SD₁² + (n₂−1)·SD₂²) / (n₁+n₂−2)]. If n₁ = n₂, you can use the simplified form √((SD₁² + SD₂²) / 2).
Apply the formula. Divide the mean difference (M₁ − M₂) by SD_pooled. The result is Cohen's d.
Interpret the result. Compare d against the small/medium/large benchmarks, but also consider what effect size is meaningful in your specific domain. Report both the value and its interpretation.
Common Calculation Mistakes
Using the Wrong SD Type
Using the population standard deviation (÷ n) instead of the sample standard deviation (÷ n−1) inflates the denominator and shrinks Cohen's d. Most software reports the sample SD by default, but always verify.
Ignoring Unequal Sample Sizes
The simplified pooled SD formula (average of two SDs) only works when n₁ = n₂. With unequal groups, skipping the weighted formula introduces bias — especially when one group is much larger than the other.
Applying Paired Formula to Independent Data
The paired formula (M_diff / SD_diff) requires the same participants in both conditions. Applying it to two separate groups produces a misleadingly large d because within-subject variance is much smaller.
Treating Benchmarks as Universal
Cohen's 0.2 / 0.5 / 0.8 thresholds were designed for psychology research. A d of 0.2 can be clinically large in medicine (e.g., mortality reduction), while a d of 0.8 may be trivial in a high-noise physical measurement.
Worked Examples
Example 1 — Psychology: Cognitive Training Study
Does a memory training program improve recall scores?
A researcher tests two groups of 30 participants each. The training group completes six weeks of structured memory exercises; the control group does no training. Both groups complete the same recall test (maximum score 100) at the end of the study.
| Group | n | Mean (M) | Standard Deviation (SD) |
|---|---|---|---|
| Training | 30 | 72 | 12 |
| Control | 30 | 60 | 11 |
Mean difference: M₁ − M₂ = 72 − 60 = 12 points
Pooled SD (equal n, simplified): SD_pooled = √((12² + 11²) / 2) = √((144 + 121) / 2) = √(132.5) ≈ 11.51
Cohen's d: d = 12 / 11.51 ≈ 1.04
✓ d = 1.04 — a large effect. The training group scored, on average, about one full standard deviation higher than the control group. This is a substantial difference that would be practically meaningful in an educational context.
Example 2 — Medicine: Blood Pressure Drug Trial
Does a new antihypertensive drug reduce systolic blood pressure?
A clinical trial assigns 50 patients to a new drug and 40 patients to a placebo. Systolic blood pressure (mmHg) is measured after 12 weeks.
| Group | n | Mean (M) | SD |
|---|---|---|---|
| Drug | 50 | 128 | 14 |
| Placebo | 40 | 142 | 16 |
Mean difference: 142 − 128 = 14 mmHg (drug group is lower)
Pooled SD (unequal n, full formula): SD_pooled = √[((50−1)·14² + (40−1)·16²) / (50+40−2)] = √[(49·196 + 39·256) / 88] = √[(9604 + 9984) / 88] = √[19588 / 88] = √222.6 ≈ 14.92
Cohen's d: d = 14 / 14.92 ≈ 0.94
✓ d = 0.94 — a large effect by Cohen's benchmarks. In clinical terms, a 14 mmHg reduction in systolic pressure is also clinically significant, corroborating the effect size finding. Both the magnitude and the clinical context point to a meaningful intervention.
Example 3 — Education: Teaching Method Comparison
Do students taught with active learning methods outscore those in traditional lecture courses?
Two sections of an introductory statistics course use different instructional formats. Scores are measured on a standardized final exam (0–100).
| Section | n | Mean | SD |
|---|---|---|---|
| Active Learning | 35 | 78 | 9 |
| Traditional Lecture | 35 | 73 | 10 |
Mean difference: 78 − 73 = 5 points
Pooled SD: √((9² + 10²) / 2) = √((81 + 100) / 2) = √90.5 ≈ 9.51
Cohen's d: 5 / 9.51 ≈ 0.53
✓ d = 0.53 — a medium effect. In education research, where interventions rarely exceed d = 0.4, this result would be considered a meaningful pedagogical advantage for the active learning approach.
Example 4 — A/B Testing: Website Checkout Conversion
Does a redesigned checkout page increase average order value?
An e-commerce team runs an A/B test. Version A is the existing checkout; Version B adds a one-click upsell. Average order value (USD) is the metric.
| Variant | n | Mean Order ($) | SD ($) |
|---|---|---|---|
| Version B (upsell) | 1,200 | 47.80 | 18.50 |
| Version A (control) | 1,200 | 44.20 | 17.90 |
Mean difference: $47.80 − $44.20 = $3.60
Pooled SD: √((18.50² + 17.90²) / 2) = √((342.25 + 320.41) / 2) = √331.33 ≈ 18.20
Cohen's d: 3.60 / 18.20 ≈ 0.20
✓ d = 0.20 — a small effect by academic benchmarks, but with n = 1,200 the test has very high power and the $3.60 difference is likely statistically significant. In e-commerce, a small d can represent large revenue at scale. This example illustrates why business context matters as much as the benchmark thresholds.
Cohen's d Interpretation Scale
Effect Size Benchmarks (Cohen, 1988)
Understanding Overlap Between Distributions
Another way to interpret Cohen's d is through the overlap of the two group distributions. When d = 0, the groups are perfectly aligned. As d grows, the distributions pull apart and the overlap shrinks. The table below shows the percentage of the combined distribution that the two groups share at each benchmark level.
| Cohen's d | Overlap (%) | Interpretation | Real-world analogy |
|---|---|---|---|
| 0.0 | 100% | No effect | Two identical groups |
| 0.2 | ~85% | Small | Difference in height between 15- and 16-year-olds |
| 0.5 | ~67% | Medium | IQ difference between trained and untrained managers |
| 0.8 | ~53% | Large | IQ difference between PhD holders and general population |
| 1.0 | ~45% | Very large | Height difference between men and women (in many populations) |
| 2.0 | ~19% | Exceptional | Rare in behavioral research; more common in physics or engineering |
Context-Dependent Interpretation
Cohen himself acknowledged that his 0.2/0.5/0.8 benchmarks were arbitrary and should serve only as rough guides. Different disciplines have accumulated their own baseline expectations:
| Field | Typical Range | Notes |
|---|---|---|
| Psychotherapy outcomes | d = 0.5–0.8 | Active treatment vs. waitlist control |
| Education interventions | d = 0.2–0.4 | Per Hattie's Visible Learning meta-analysis (threshold = 0.4) |
| Pharmaceutical trials | d = 0.2–0.5 | Smaller effects can still be clinically significant |
| Social psychology lab studies | d = 0.4–1.0 | Controlled conditions yield larger effects than field studies |
| A/B testing (digital products) | d = 0.1–0.3 | Small d with large n can still drive business value |
| Personnel selection (hiring) | d = 0.3–0.5 | Difference between high and average performers |
Interactive Cohen's d Calculator
Cohen's d Calculator
Cohen's d Compared to Related Metrics
Cohen's d vs. Hedges' g
Hedges' g applies a correction factor J to reduce the positive bias that Cohen's d has in small samples. The correction is: g = d × J, where J = 1 − (3 / (4·df − 1)) and df = n₁ + n₂ − 2. For samples of 20 or more per group, the difference between d and g is below 5% and generally negligible. For samples under 10 per group, Hedges' g is the better choice — and is the standard in meta-analyses.
Use Cohen's d for individual studies with adequate sample sizes (n ≥ 20 per group). Use Hedges' g when conducting meta-analyses, reporting results from very small samples, or when your software computes it directly and you need to pool studies of differing sizes.
Cohen's d vs. Glass's Delta (Δ)
Glass's delta uses only the control group's standard deviation as the denominator, rather than a pooled estimate: Δ = (M_treatment − M_control) / SD_control. This is appropriate when you have strong reason to believe the treatment changed the variance — for example, if an intervention increases variability in outcomes alongside the mean. When group variances are roughly equal, Glass's delta and Cohen's d produce similar values.
Cohen's d vs. the P-Value
| Property | Cohen's d | P-Value |
|---|---|---|
| What it measures | Magnitude of the effect | Probability under the null hypothesis |
| Affected by sample size? | No | Yes — grows with n |
| Tells you practical importance? | Yes | No |
| Tells you if the effect is real? | No | Yes (indirectly) |
| Can be misleading? | Without context, yes | With large n, yes |
| Recommended use | Alongside p-value | Alongside effect size |
Cohen's d vs. Pearson's r
Both are standardized effect sizes, but they suit different study designs. Cohen's d compares two group means on a continuous outcome; Pearson's r measures the linear association between two continuous variables. The two can be converted: r = d / √(d² + 4). A Cohen's d of 0.5 corresponds to r ≈ 0.24; d = 0.8 corresponds to r ≈ 0.37. Learn more about correlation and scatter plots here.
Real-World Applications
Psychology Research
Cohen's d is nearly universal in experimental psychology. Studies on cognitive training, therapy outcomes, group differences in test performance, and treatment comparisons all routinely report it. Meta-analyses in clinical psychology — such as those examining the effectiveness of cognitive-behavioral therapy versus waitlist controls — aggregate Cohen's d values from dozens of primary studies to estimate the overall effect. The APA Publication Manual explicitly recommends reporting effect sizes for all primary analyses.
Medical and Clinical Trials
Regulatory agencies and clinical journals increasingly require effect size reporting alongside significance levels. In drug trials comparing a treatment to a placebo on continuous outcomes (blood pressure, biomarkers, quality-of-life scores), Cohen's d gives clinicians an intuitive sense of whether the group difference translates to meaningful patient benefit — something a p-value cannot communicate on its own.
Education Research
Researcher John Hattie's landmark meta-analysis Visible Learning synthesized over 800 meta-analyses covering millions of students and expressed all effects in Cohen's d units. He proposed d = 0.40 as the "hinge point" — the threshold at which an educational intervention is worth implementing. This work made Cohen's d the standard reporting metric across educational psychology worldwide. Explore the foundations of hypothesis testing on Statistics Fundamentals.
A/B Testing in Tech and Product
Product teams at technology companies use Cohen's d to understand whether changes to interfaces, algorithms, or features produce meaningful user behavior differences. With very large sample sizes typical of web experiments, even a d of 0.05 can be statistically significant — making effect size reporting essential for separating business-relevant changes from statistical noise.
Meta-Analysis
Meta-analysis aggregates Cohen's d values across multiple independent studies to estimate the overall effect of an intervention or phenomenon. Because d is dimensionless, studies measuring outcomes on different scales can be meaningfully combined. The resulting pooled d and its confidence interval provide a much more reliable estimate of truth than any single study. See the confidence intervals guide for how uncertainty is quantified around effect size estimates.
Cohen's d Formula Cheat Sheet
| Metric | Formula | When to Use |
|---|---|---|
| Cohen's d (independent) | d = (M₁−M₂) / SD_pooled | Two separate groups, equal variance |
| Pooled SD (equal n) | √((SD₁²+SD₂²)/2) | n₁ = n₂ only |
| Pooled SD (unequal n) | √(((n₁−1)SD₁²+(n₂−1)SD₂²)/(n₁+n₂−2)) | Any sample sizes |
| Cohen's d (paired) | d = M_diff / SD_diff | Same participants, two conditions |
| Hedges' g correction | g = d × (1 − 3/(4df−1)) | Small samples or meta-analysis |
| Glass's delta | Δ = (M₁−M₂) / SD_control | Unequal variances expected |
| d to r conversion | r = d / √(d²+4) | Comparing to correlation-based effects |
Entity & Terminology Glossary
| Term | Formula / Symbol | Definition |
|---|---|---|
| Cohen's d | (M₁−M₂)/SD_pooled | Standardized mean difference between two groups; the primary effect size for group comparisons |
| Effect size | Various | A quantitative measure of the practical magnitude of a result, independent of sample size |
| Standard deviation | √variance | Measure of spread around the mean; the denominator in Cohen's d |
| Pooled SD | Weighted combination | A combined estimate of variability that accounts for both groups' sizes and spreads |
| Mean difference | M₁ − M₂ | The raw (unstandardized) distance between two group averages |
| Hedges' g | d × correction factor | A bias-corrected version of Cohen's d; preferred for small samples and meta-analyses |
| Glass's delta | (M₁−M₂)/SD_control | Uses only the control group's SD; appropriate when treatment alters the spread of scores |
| Statistical power | 1 − β | The probability of detecting a true effect; increases with larger n and larger d |
| Confidence interval | d ± margin | The range within which the true population effect size likely falls |
| Null hypothesis | H₀: d = 0 | The default assumption that there is no difference between the two groups |
| Practical significance | — | Whether a statistically significant result is large enough to matter in the real world |
Common Misconceptions
| Misconception | What's Wrong | Correct Understanding |
|---|---|---|
| "A small Cohen's d means the result is unimportant." | Conflates magnitude with importance | A d of 0.2 can save lives in medicine or represent millions in revenue at scale |
| "Statistical significance implies a large effect size." | p-value depends heavily on sample size | p < 0.001 can accompany d = 0.05 in large-n studies |
| "Negative Cohen's d means the study failed." | Sign reflects labeling, not quality | Negative just means Group 2 scored higher; relabel groups if direction matters |
| "0.2/0.5/0.8 always apply." | Context-free application of universal benchmarks | Field-specific norms exist; always anchor to domain baselines |
| "Cohen's d works for any outcome type." | Assumes approximately normal distributions | For skewed data or ordinal scales, consider rank-based or other effect sizes |
Related Topics on Statistics Fundamentals
Cohen's d fits into a broader framework of inferential statistics. These guides on Statistics Fundamentals connect directly to what you've learned here:
Hypothesis Testing
Understand the full framework that surrounds Cohen's d — null hypotheses, test statistics, and decision rules.
Statistical Test Selector
Not sure which test to use? The interactive selector guides you to the right test based on your data type and design.
Two-Sample T-Test
The t-test produces the p-value that Cohen's d accompanies. Understanding both gives the full picture of a group comparison.
Paired Samples T-Test
When you use the paired Cohen's d formula, the paired-samples t-test is the accompanying significance test.
Standard Deviation
The denominator of Cohen's d. Understanding standard deviation is essential for computing and interpreting pooled SD.
Confidence Intervals
Effect sizes should be reported with confidence intervals. A point estimate of d = 0.5 means less without knowing whether its 95% CI is [0.1, 0.9] or [0.4, 0.6].
Academic Sources
- Cohen, J. (1988). Statistical Power Analysis for the Behavioral Sciences (2nd ed.). Lawrence Erlbaum Associates. — The original text establishing the d formula and 0.2/0.5/0.8 benchmarks.
- Hedges, L. V., & Olkin, I. (1985). Statistical Methods for Meta-Analysis. Academic Press. — Introduces the bias-corrected g statistic.
- Hattie, J. (2009). Visible Learning: A Synthesis of Over 800 Meta-Analyses Relating to Achievement. Routledge. — Establishes d = 0.40 as the education benchmark.
- Lakens, D. (2013). Calculating and reporting effect sizes to facilitate cumulative science. Frontiers in Psychology, 4, 863. — A practical guide to effect size reporting, freely available via doi.org.
- Fritz, C. O., Morris, P. E., & Richler, J. J. (2012). Effect size estimates: Current use, calculations, and interpretation. Journal of Experimental Psychology: General, 141(1), 2–18.
Frequently Asked Questions
What is Cohen's d used for?
Cohen's d measures the practical magnitude of a difference between two group means. It tells you how large the effect is in standard deviation units — independent of sample size. Researchers use it in psychology, medicine, education, and A/B testing whenever a p-value alone is not enough to judge whether a result matters in practice.
How do you interpret Cohen's d?
Jacob Cohen's benchmarks treat d = 0.2 as a small effect, d = 0.5 as medium, and d = 0.8 as large. These thresholds are domain-dependent: a d of 0.2 can be clinically meaningful in medicine, while a d of 0.8 might be unremarkable in some laboratory settings. Always consider the specific field's norms and the practical stakes of the decision.
Can Cohen's d be negative?
Yes. A negative value simply means the second group mean is larger than the first. The sign reflects which group was placed in the numerator, not whether the result is good or bad. When reporting effect size magnitude, take the absolute value; keep the sign only when the direction of the difference carries theoretical meaning.
Is Cohen's d better than the p-value?
They answer different questions and work best together. The p-value indicates whether an effect is likely real given sampling variability; Cohen's d tells you how large that effect is. A study with a large sample can produce a statistically significant p-value for a Cohen's d of 0.03 — real but practically trivial. Reporting both is the current standard in most peer-reviewed journals.
What is a good effect size for Cohen's d?
"Good" depends entirely on the context. In education, Hattie's research suggests d ≥ 0.4 is needed to justify an intervention. In medicine, d = 0.2 can represent a clinically important difference if the outcome is mortality or serious morbidity. In technology A/B testing, a d as small as 0.1 can be worth implementing at sufficient scale. There is no single threshold for "good."
When should I use Hedges' g instead of Cohen's d?
Use Hedges' g when your sample size is below about 20 per group, or when conducting a meta-analysis that pools studies of varying sizes. For most individual studies with adequate samples, the two values differ by less than 5% and the distinction is rarely consequential in practice.