What Is a Paired Samples t-test?
The core idea is straightforward. Instead of comparing two separate groups (which is what an independent samples t-test does), you compare two measurements from the same individuals. That pairing eliminates the noise caused by individual differences — the variation between people that has nothing to do with your treatment.
Every paired t-test fits one of two designs. The first is a before-and-after study: you measure something, apply a treatment, then measure again. The blood pressure example above is a before-and-after study. The second is a two-condition study: the same subjects complete both conditions, so each person serves as their own control. For a deeper look at how study structure shapes your statistical choices, see the study design guide at Statistics Fundamentals.
- Also called: Dependent t-test, matched pairs t-test, paired-difference t-test, repeated-samples t-test
- Formula: t = x̄_d / (s_d / √n), where x̄_d = mean of differences, s_d = SD of differences, n = number of pairs
- Degrees of freedom: df = n − 1 (n is number of pairs, not total observations)
- Null hypothesis: H₀: μ_d = 0 — the mean difference in the population equals zero
- Effect size: Cohen's d = x̄_d / s_d. Benchmarks: 0.2 small, 0.5 medium, 0.8 large
- Non-parametric alternative: Wilcoxon signed-rank test (use when normality assumption fails)
- Key advantage over independent t-test: Greater statistical power — it removes between-subject variability from the error term
The Paired t-test Formula: Every Symbol Defined
The paired t-test works by collapsing two columns of data into one column of difference scores. Once you have those differences, the math is identical to a one-sample t-test against zero. Here is the formula, rendered properly:
x̄_d = mean of all difference scores
s_d = standard deviation of differences
n = number of pairs (not observations)
s_d / √n = standard error (SE) of mean difference
df = n − 1
Why Difference Scores — Not Raw Scores?
Students sometimes wonder why we calculate a single difference score per pair rather than analyzing the raw "before" and "after" columns separately. The diagram below makes this concrete.
From Two Distributions to One: Why We Use Difference Scores
The pre- and post-test distributions overlap and carry wide individual-difference noise. Subtracting them collapses two distributions into one tighter distribution of changes — and the t-test simply asks whether that distribution's mean is significantly different from zero.
Standard Error of the Mean Difference
The denominator of the t-statistic is the standard error (SE) of the mean difference:
s_d = SD of the difference scores
n = number of pairs
A larger t-statistic (in absolute value) means the observed mean difference is many standard errors away from zero — making it less likely to be a random result. The corresponding p-value converts that distance into a probability. For a broader look at how t-distributions work, see the t-distribution table.
Four Assumptions of the Paired Samples t-test
Before running the test, verify all four assumptions. Violating any one of them can produce misleading results. The good news: assumptions 1 and 2 are satisfied by your study design; only 3 and 4 require active checking.
1. Continuous dependent variable. 2. Randomly sampled, independent pairs. 3. No significant outliers in the difference scores. 4. Difference scores are approximately normally distributed.
| # | Assumption | What it means | How to check it |
|---|---|---|---|
| 1 | Continuous dependent variable | Your outcome must be measured on an interval or ratio scale (e.g., blood pressure, test scores, reaction time). Ordinal or categorical data do not qualify. | Inspect your measurement scale. No statistical test needed — this is a design decision. |
| 2 | Independent, randomly sampled pairs | Each pair of observations must be independent of all other pairs. One patient's before/after values should not influence another patient's values. | Verify your sampling method. This is satisfied by proper experimental design. |
| 3 | No significant outliers in differences | Extreme outliers in the difference scores can distort the mean and inflate the standard deviation, producing a misleading t-statistic. | Create a boxplot of d_i values. Flag any points more than 1.5 × IQR beyond Q1 or Q3. Investigate before excluding. |
| 4 | Normal distribution of differences | The difference scores (not the raw pre/post values) should be approximately normally distributed. You do not need to check normality in the two raw columns — only in their differences. | Shapiro-Wilk test (p > .05 = normality assumed) or Q-Q plot. For n ≥ 30, the Central Limit Theorem makes this assumption much less critical (see note below). |
Many students worry about normality regardless of sample size. For samples of 30 pairs or more, the Central Limit Theorem guarantees that the sampling distribution of x̄_d approaches normality even when the individual differences are not perfectly normal. The paired t-test is quite robust to this violation when n is large. For small samples (n < 30), check assumption 4 carefully and consider the Wilcoxon signed-rank test if normality is questionable.
The most frequent error students make is setting n equal to the total number of observations rather than the number of pairs. If 20 patients each produce a pre-score and a post-score, then n = 20 and df = 19 — not n = 40 and df = 39. Using n = 40 would underestimate your standard error and produce an inflated t-statistic, making results appear more significant than they are.
How to Perform a Paired t-test: 5-Step Method
The following worked example walks through a complete paired t-test calculation by hand. The dataset: 10 patients have their systolic blood pressure recorded before and after 8 weeks of a new medication.
Do 8 weeks of medication significantly reduce systolic blood pressure?
| Patient | Before (x₁) | After (x₂) | d = x₁ − x₂ | (d − x̄_d)² |
|---|---|---|---|---|
| 1 | 148 | 136 | 12 | 4.00 |
| 2 | 152 | 140 | 12 | 4.00 |
| 3 | 144 | 130 | 14 | 16.00 |
| 4 | 160 | 150 | 10 | 4.00 |
| 5 | 155 | 143 | 12 | 4.00 |
| 6 | 138 | 126 | 12 | 4.00 |
| 7 | 163 | 148 | 15 | 25.00 |
| 8 | 147 | 138 | 9 | 9.00 |
| 9 | 151 | 140 | 11 | 1.00 |
| 10 | 158 | 145 | 13 | 1.00 |
| Totals | Σ = 120 | Σ = 72.00 | ||
State hypotheses: H₀: μ_d = 0 (medication has no effect on blood pressure). H₁: μ_d ≠ 0 (two-tailed; medication changes blood pressure). Significance level: α = 0.05.
Calculate mean difference: x̄_d = Σd / n = 120 / 10 = 12.0 mmHg. On average, blood pressure dropped 12 points after medication.
Calculate standard deviation of differences: s_d = √[Σ(d − x̄_d)² / (n−1)] = √(72 / 9) = √8.0 ≈ 2.828. The SE = s_d / √n = 2.828 / √10 = 2.828 / 3.162 ≈ 0.894.
Calculate t-statistic: t = x̄_d / SE = 12.0 / 0.894 ≈ 13.42. Degrees of freedom: df = n − 1 = 10 − 1 = 9.
Find p-value and conclude: With t(9) = 13.42, the two-tailed p-value is p < .001. Since p < α = 0.05, we reject H₀. Cohen's d = x̄_d / s_d = 12.0 / 2.828 ≈ 4.24 — a very large effect.
✓ The medication produced a statistically significant reduction in systolic blood pressure of 12.0 mmHg on average, t(9) = 13.42, p < .001, d = 4.24. The 95% confidence interval for the mean difference is [10.10, 13.90] mmHg.
5 Real-World Examples of the Paired Samples t-test
Example 1 — Medical (above)
Blood pressure before and after 8 weeks of antihypertensive medication in the same 10 patients.
Example 2 — Education
Student math test scores before and after a 6-week tutoring program. Each student is measured twice.
Example 3 — Sports Science
Maximum vertical jump height before and after 8 weeks of plyometric training in 20 basketball players.
Example 4 — Audiology
Hearing loss measured in a patient's left versus right ear. The same patient provides both measurements — a matched-pair design.
Example 5 — Psychology
Perceived social support scored before and after completing an 8-week social skills program. Pre M = 32.83, Post M = 38.07, t(19) = −3.23, p = .004, d = 0.73.
Each example shares the same structure: a single group of subjects measured under two conditions, with the t-test applied to the differences. This within-subjects design gives the paired t-test considerably more statistical power than an equivalent independent-groups study, because between-subject variability — how different people are from each other — is removed from the error term entirely.
Paired t-test vs. Independent t-test: When to Use Each
The decision between the two t-tests comes down to one question: do the same subjects appear in both conditions? If yes, use the paired t-test. If no, use the independent samples t-test.
| Feature | Paired Samples t-test | Independent Samples t-test |
|---|---|---|
| Subjects | Same individuals measured twice, or matched pairs | Two completely separate, unrelated groups |
| Study design | Within-subjects (before/after, crossover, matched) | Between-subjects (treatment vs. control groups) |
| Degrees of freedom | n − 1 (n = number of pairs) | n₁ + n₂ − 2 |
| Statistical power | Higher — controls for individual differences | Lower — individual variation stays in error term |
| Key assumption | Difference scores normally distributed | Both groups normally distributed; equal variances (or Welch's) |
| SPSS path | Analyze → Compare Means → Paired-Samples T Test | Analyze → Compare Means → Independent-Samples T Test |
| R function | t.test(x, y, paired = TRUE) | t.test(x, y, paired = FALSE) |
| Python function | scipy.stats.ttest_rel(a, b) | scipy.stats.ttest_ind(a, b) |
If the same person or object contributes one score to each group, use the paired t-test. If every score comes from a different, unrelated individual, use the independent t-test.
Effect Size: Cohen's d for the Paired t-test
A statistically significant p-value tells you that the difference is unlikely to be chance. It says nothing about how large that difference is in practical terms. Effect size fills that gap. For the paired t-test, the standard measure is Cohen's d (Cohen, 1988):
x̄_d = mean difference
s_d = SD of differences
d = standardized effect (unitless)
| Cohen's d Value | Effect Size | Interpretation | Example in education research |
|---|---|---|---|
| 0.2 | Small | The groups differ by 0.2 standard deviations — often hard to see without careful measurement | Minor improvement in quiz scores after a single-lecture intervention |
| 0.5 | Medium | A noticeable, meaningful difference — visible to the naked eye in most contexts | Moderate score gains after a semester-long tutoring program |
| 0.8 | Large | A substantial difference — practically significant in almost every context | Major improvement after intensive one-on-one instruction |
| > 1.0 | Very large | Rare in behavioral research — likely a strong, well-controlled intervention | Mastery-based learning replacing traditional lecture format entirely |
For small samples (n < 50), Cohen's d tends to overestimate the true population effect. Hedges' g applies a small-sample correction: multiply d by a correction factor of approximately (n − 3) / (n − 2.25). Most statistical software (including SPSS 27+) can output Hedges' g automatically alongside Cohen's d.
A large sample can produce p < .001 for a difference of d = 0.08 — statistically significant but trivially small. Always report Cohen's d alongside your p-value. Most journals and APA guidelines now require both.
Running the Paired t-test: SPSS, R & Python
SPSS — Step-by-Step
How to run the paired t-test in IBM SPSS Statistics
Go to Analyze → Compare Means and Proportions → Paired-Samples T Test
Move your "before" variable to Variable 1 and your "after" variable to Variable 2 in the Paired Variables box. Each row is one pair.
Click Options to set confidence level (95% default) and handle missing values. Click OK.
In the output, read the Paired Samples Test table: Mean Difference, t-statistic, df, Sig. (2-tailed), and 95% CI of the difference.
For normality, run Analyze → Descriptive Statistics → Explore on your difference variable. Check the Shapiro-Wilk result — p > .05 confirms normality.
✓ SPSS generates three tables: Paired Samples Statistics, Paired Samples Correlations, and Paired Samples Test. Focus on the Paired Samples Test table for your inferential results.
R — Complete Code with Output
Python — scipy.stats
How to Report Paired t-test Results in APA Format
APA 7 requires five pieces of information: means and standard deviations for both conditions, the t-statistic with degrees of freedom in parentheses, the exact p-value, and Cohen's d. Here is a copy-paste template:
Worked APA Write-Up — Psychology Example
Social Support Intervention Study
A paired samples t-test revealed a statistically significant increase in perceived social support from pre-program (M = 32.83, SD = 7.91) to post-program (M = 38.07, SD = 7.23), t(19) = −3.23, p = .004, d = 0.73. This represents a medium-to-large effect, indicating the 8-week social skills program produced a practically meaningful improvement beyond statistical significance alone.
Include: ① M and SD for both conditions ② t-statistic ③ df in parentheses, e.g. t(19) ④ exact p-value (not just p < .05) ⑤ Cohen's d ⑥ 95% confidence interval of the mean difference (increasingly required by journals).
Non-Parametric Alternative: Wilcoxon Signed-Rank Test
When assumption 4 (normality of differences) fails — and your sample is too small for the Central Limit Theorem to rescue you — the Wilcoxon signed-rank test is the appropriate substitute. It ranks the absolute differences and tests whether positive and negative differences are balanced, without assuming any particular distribution shape.
Switch to the Wilcoxon test when: your difference scores are clearly non-normal on a Shapiro-Wilk test (p < .05) and n < 30; your data is ordinal rather than continuous; or extreme outliers cannot be removed for substantive reasons. In R: wilcox.test(before, after, paired = TRUE). In Python: scipy.stats.wilcoxon(before, after).
The Wilcoxon test is somewhat less powerful than the paired t-test when normality holds, so do not default to it as a precaution — verify the assumption first. For a full overview of when to choose non-parametric methods, see hypothesis testing fundamentals.
Free Paired t-test Calculator
Enter your paired data below — one pair per line, comma-separated (before, after). The calculator computes the t-statistic, p-value, degrees of freedom, Cohen's d, and 95% confidence interval.
🧮 Paired Samples t-test Calculator
Enter pairs as before, after — one pair per line. Example: 148, 136
Formula & Entity Glossary
| Symbol / Entity | Name | Definition / Formula |
|---|---|---|
| t | t-statistic | t = x̄_d / (s_d / √n) — the test statistic compared to the t-distribution |
| x̄_d | Mean of differences | Average of all (x₁ᵢ − x₂ᵢ) difference scores across n pairs |
| s_d | SD of differences | Standard deviation of the n difference scores d_i |
| SE | Standard error | SE = s_d / √n — the precision of the mean difference estimate |
| df | Degrees of freedom | df = n − 1, where n is the number of pairs |
| H₀ | Null hypothesis | μ_d = 0 — the population mean difference equals zero |
| H₁ | Alternative hypothesis | μ_d ≠ 0 (two-tailed), μ_d < 0, or μ_d > 0 (one-tailed) |
| α | Significance level | Threshold for rejecting H₀, typically 0.05 |
| d | Cohen's d | d = x̄_d / s_d. Benchmarks: 0.2 small, 0.5 medium, 0.8 large (Cohen, 1988) |
| CI | Confidence interval | x̄_d ± t*(α/2, n−1) × SE — range likely to contain the true μ_d |
| CLT | Central Limit Theorem | For n ≥ 30, the sampling distribution of x̄_d is approximately normal regardless of the underlying difference distribution |
| Shapiro-Wilk | Normality test | Tests H₀: difference scores are normally distributed. p > .05 supports normality assumption. |
Frequently Asked Questions
Related Statistical Tests
One-Sample t-test
Compare a sample mean to a known population value. The paired t-test is mathematically equivalent to a one-sample t-test on the difference scores.
Independent Samples t-test
Compare means from two unrelated groups. Choose this when subjects differ between conditions.
Hypothesis Testing Guide
The framework behind all t-tests — null hypotheses, p-values, Type I/II errors, and power.
Normal Distribution
The theoretical basis for the t-test's normality assumption and the shape of the t-distribution.
Confidence Intervals
The 95% CI around the mean difference is a key component of complete paired t-test reporting.
Sampling Distributions
Why the Central Limit Theorem makes the paired t-test robust to non-normality at n ≥ 30.
External references: Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Lawrence Erlbaum. | Lakens, D. (2013). Calculating and reporting effect sizes. Frontiers in Psychology | scipy.stats.ttest_rel documentation | R t.test() documentation