Hypothesis Testing Parametric Tests Within-Subjects Design 18 min read May 2, 2026
BY: Statistics Fundamentals Team
Reviewed By: Minsa A (Senior Statistics Editor)

Paired Samples t-test: Complete Guide with Formula and 5 Real Examples

Twelve patients take a blood pressure drug for 8 weeks. Before the trial, their average systolic pressure is 148 mmHg. After, it's 136 mmHg. Did the drug actually work — or could that 12-point drop be random noise? A paired samples t-test answers that question precisely, because the same 12 people appear in both groups.

This guide covers the full formula, all four assumptions, five worked examples with real numbers, and step-by-step code in SPSS, R, and Python. The interactive calculator at the bottom lets you test your own data right now.

What You'll Learn
  • ✓ The exact definition and all four alternative names for this test
  • ✓ The complete formula, with every symbol defined and LaTeX rendered
  • ✓ Four assumptions — and how to check each one before running the test
  • ✓ Five step-by-step worked examples across different fields
  • ✓ When to use paired vs. independent t-test (with a decision table)
  • ✓ Effect size (Cohen's d), APA reporting, and the non-parametric alternative
  • ✓ SPSS, R, and Python code with annotated output

What Is a Paired Samples t-test?

Definition — Dependent Samples t-test
A paired samples t-test is a parametric statistical test that compares the means of two related measurements taken from the same subjects or matched pairs. It tests whether the mean difference between paired observations is statistically significantly different from zero. The test is also called the dependent t-test, matched pairs t-test, paired-difference t-test, and repeated-samples t-test.
t = x̄_d / (s_d / √n)  |  df = n − 1

The core idea is straightforward. Instead of comparing two separate groups (which is what an independent samples t-test does), you compare two measurements from the same individuals. That pairing eliminates the noise caused by individual differences — the variation between people that has nothing to do with your treatment.

Every paired t-test fits one of two designs. The first is a before-and-after study: you measure something, apply a treatment, then measure again. The blood pressure example above is a before-and-after study. The second is a two-condition study: the same subjects complete both conditions, so each person serves as their own control. For a deeper look at how study structure shapes your statistical choices, see the study design guide at Statistics Fundamentals.

⚡ Quick Reference — Paired Samples t-test Key Facts
  • Also called: Dependent t-test, matched pairs t-test, paired-difference t-test, repeated-samples t-test
  • Formula: t = x̄_d / (s_d / √n), where x̄_d = mean of differences, s_d = SD of differences, n = number of pairs
  • Degrees of freedom: df = n − 1 (n is number of pairs, not total observations)
  • Null hypothesis: H₀: μ_d = 0 — the mean difference in the population equals zero
  • Effect size: Cohen's d = x̄_d / s_d. Benchmarks: 0.2 small, 0.5 medium, 0.8 large
  • Non-parametric alternative: Wilcoxon signed-rank test (use when normality assumption fails)
  • Key advantage over independent t-test: Greater statistical power — it removes between-subject variability from the error term
n−1
Degrees of freedom
0.05
Typical alpha level
4
Assumptions to check
0.2 / 0.5 / 0.8
Cohen's d benchmarks

The Paired t-test Formula: Every Symbol Defined

The paired t-test works by collapsing two columns of data into one column of difference scores. Once you have those differences, the math is identical to a one-sample t-test against zero. Here is the formula, rendered properly:

Paired Samples t-test — t-Statistic
t = x̄d ÷ (sd ÷ √n)
In LaTeX: $$t = \dfrac{\bar{x}_d}{s_d / \sqrt{n}}$$
x̄_d = mean of all difference scores s_d = standard deviation of differences n = number of pairs (not observations) s_d / √n = standard error (SE) of mean difference df = n − 1

Why Difference Scores — Not Raw Scores?

Students sometimes wonder why we calculate a single difference score per pair rather than analyzing the raw "before" and "after" columns separately. The diagram below makes this concrete.

From Two Distributions to One: Why We Use Difference Scores

Pre-test Wide spread (individual diffs) Post-test subtract μ_d = 0? Differences Narrower — less noise

The pre- and post-test distributions overlap and carry wide individual-difference noise. Subtracting them collapses two distributions into one tighter distribution of changes — and the t-test simply asks whether that distribution's mean is significantly different from zero.

Standard Error of the Mean Difference

The denominator of the t-statistic is the standard error (SE) of the mean difference:

Standard Error of the Mean Difference
SE = sd ÷ √n
As n increases, SE shrinks — larger samples produce more precise estimates
s_d = SD of the difference scores n = number of pairs

A larger t-statistic (in absolute value) means the observed mean difference is many standard errors away from zero — making it less likely to be a random result. The corresponding p-value converts that distance into a probability. For a broader look at how t-distributions work, see the t-distribution table.

Four Assumptions of the Paired Samples t-test

Before running the test, verify all four assumptions. Violating any one of them can produce misleading results. The good news: assumptions 1 and 2 are satisfied by your study design; only 3 and 4 require active checking.

📋
The Four Assumptions — Numbered List for Quick Reference

1. Continuous dependent variable. 2. Randomly sampled, independent pairs. 3. No significant outliers in the difference scores. 4. Difference scores are approximately normally distributed.

# Assumption What it means How to check it
1 Continuous dependent variable Your outcome must be measured on an interval or ratio scale (e.g., blood pressure, test scores, reaction time). Ordinal or categorical data do not qualify. Inspect your measurement scale. No statistical test needed — this is a design decision.
2 Independent, randomly sampled pairs Each pair of observations must be independent of all other pairs. One patient's before/after values should not influence another patient's values. Verify your sampling method. This is satisfied by proper experimental design.
3 No significant outliers in differences Extreme outliers in the difference scores can distort the mean and inflate the standard deviation, producing a misleading t-statistic. Create a boxplot of d_i values. Flag any points more than 1.5 × IQR beyond Q1 or Q3. Investigate before excluding.
4 Normal distribution of differences The difference scores (not the raw pre/post values) should be approximately normally distributed. You do not need to check normality in the two raw columns — only in their differences. Shapiro-Wilk test (p > .05 = normality assumed) or Q-Q plot. For n ≥ 30, the Central Limit Theorem makes this assumption much less critical (see note below).
💡
The n ≥ 30 Rule and the Central Limit Theorem

Many students worry about normality regardless of sample size. For samples of 30 pairs or more, the Central Limit Theorem guarantees that the sampling distribution of x̄_d approaches normality even when the individual differences are not perfectly normal. The paired t-test is quite robust to this violation when n is large. For small samples (n < 30), check assumption 4 carefully and consider the Wilcoxon signed-rank test if normality is questionable.

⚠️
Common Mistake: Calculating n Incorrectly

The most frequent error students make is setting n equal to the total number of observations rather than the number of pairs. If 20 patients each produce a pre-score and a post-score, then n = 20 and df = 19 — not n = 40 and df = 39. Using n = 40 would underestimate your standard error and produce an inflated t-statistic, making results appear more significant than they are.

How to Perform a Paired t-test: 5-Step Method

The following worked example walks through a complete paired t-test calculation by hand. The dataset: 10 patients have their systolic blood pressure recorded before and after 8 weeks of a new medication.

Full Worked Example — Blood Pressure Study (n = 10)

Do 8 weeks of medication significantly reduce systolic blood pressure?

Patient Before (x₁) After (x₂) d = x₁ − x₂ (d − x̄_d)²
1148136124.00
2152140124.00
31441301416.00
4160150104.00
5155143124.00
6138126124.00
71631481525.00
814713899.00
9151140111.00
10158145131.00
TotalsΣ = 120Σ = 72.00
1

State hypotheses: H₀: μ_d = 0 (medication has no effect on blood pressure). H₁: μ_d ≠ 0 (two-tailed; medication changes blood pressure). Significance level: α = 0.05.

2

Calculate mean difference: x̄_d = Σd / n = 120 / 10 = 12.0 mmHg. On average, blood pressure dropped 12 points after medication.

3

Calculate standard deviation of differences: s_d = √[Σ(d − x̄_d)² / (n−1)] = √(72 / 9) = √8.0 ≈ 2.828. The SE = s_d / √n = 2.828 / √10 = 2.828 / 3.162 ≈ 0.894.

4

Calculate t-statistic: t = x̄_d / SE = 12.0 / 0.894 ≈ 13.42. Degrees of freedom: df = n − 1 = 10 − 1 = 9.

5

Find p-value and conclude: With t(9) = 13.42, the two-tailed p-value is p < .001. Since p < α = 0.05, we reject H₀. Cohen's d = x̄_d / s_d = 12.0 / 2.828 ≈ 4.24 — a very large effect.

✓ The medication produced a statistically significant reduction in systolic blood pressure of 12.0 mmHg on average, t(9) = 13.42, p < .001, d = 4.24. The 95% confidence interval for the mean difference is [10.10, 13.90] mmHg.

5 Real-World Examples of the Paired Samples t-test

💊

Example 1 — Medical (above)

Blood pressure before and after 8 weeks of antihypertensive medication in the same 10 patients.

🎓

Example 2 — Education

Student math test scores before and after a 6-week tutoring program. Each student is measured twice.

🏃

Example 3 — Sports Science

Maximum vertical jump height before and after 8 weeks of plyometric training in 20 basketball players.

👂

Example 4 — Audiology

Hearing loss measured in a patient's left versus right ear. The same patient provides both measurements — a matched-pair design.

🧠

Example 5 — Psychology

Perceived social support scored before and after completing an 8-week social skills program. Pre M = 32.83, Post M = 38.07, t(19) = −3.23, p = .004, d = 0.73.

Each example shares the same structure: a single group of subjects measured under two conditions, with the t-test applied to the differences. This within-subjects design gives the paired t-test considerably more statistical power than an equivalent independent-groups study, because between-subject variability — how different people are from each other — is removed from the error term entirely.

Paired t-test vs. Independent t-test: When to Use Each

The decision between the two t-tests comes down to one question: do the same subjects appear in both conditions? If yes, use the paired t-test. If no, use the independent samples t-test.

Feature Paired Samples t-test Independent Samples t-test
SubjectsSame individuals measured twice, or matched pairsTwo completely separate, unrelated groups
Study designWithin-subjects (before/after, crossover, matched)Between-subjects (treatment vs. control groups)
Degrees of freedomn − 1 (n = number of pairs)n₁ + n₂ − 2
Statistical powerHigher — controls for individual differencesLower — individual variation stays in error term
Key assumptionDifference scores normally distributedBoth groups normally distributed; equal variances (or Welch's)
SPSS pathAnalyze → Compare Means → Paired-Samples T TestAnalyze → Compare Means → Independent-Samples T Test
R functiont.test(x, y, paired = TRUE)t.test(x, y, paired = FALSE)
Python functionscipy.stats.ttest_rel(a, b)scipy.stats.ttest_ind(a, b)
🎯
Decision Rule — One Sentence

If the same person or object contributes one score to each group, use the paired t-test. If every score comes from a different, unrelated individual, use the independent t-test.

Effect Size: Cohen's d for the Paired t-test

A statistically significant p-value tells you that the difference is unlikely to be chance. It says nothing about how large that difference is in practical terms. Effect size fills that gap. For the paired t-test, the standard measure is Cohen's d (Cohen, 1988):

Cohen's d — Effect Size for Paired t-test
d = x̄d ÷ sd
The mean of differences divided by the standard deviation of differences
x̄_d = mean difference s_d = SD of differences d = standardized effect (unitless)
Cohen's d ValueEffect SizeInterpretationExample in education research
0.2SmallThe groups differ by 0.2 standard deviations — often hard to see without careful measurementMinor improvement in quiz scores after a single-lecture intervention
0.5MediumA noticeable, meaningful difference — visible to the naked eye in most contextsModerate score gains after a semester-long tutoring program
0.8LargeA substantial difference — practically significant in almost every contextMajor improvement after intensive one-on-one instruction
> 1.0Very largeRare in behavioral research — likely a strong, well-controlled interventionMastery-based learning replacing traditional lecture format entirely

For small samples (n < 50), Cohen's d tends to overestimate the true population effect. Hedges' g applies a small-sample correction: multiply d by a correction factor of approximately (n − 3) / (n − 2.25). Most statistical software (including SPSS 27+) can output Hedges' g automatically alongside Cohen's d.

📌
Statistical Significance ≠ Practical Significance

A large sample can produce p < .001 for a difference of d = 0.08 — statistically significant but trivially small. Always report Cohen's d alongside your p-value. Most journals and APA guidelines now require both.

Running the Paired t-test: SPSS, R & Python

SPSS — Step-by-Step

SPSS Tutorial — Paired-Samples T Test

How to run the paired t-test in IBM SPSS Statistics

1

Go to Analyze → Compare Means and Proportions → Paired-Samples T Test

2

Move your "before" variable to Variable 1 and your "after" variable to Variable 2 in the Paired Variables box. Each row is one pair.

3

Click Options to set confidence level (95% default) and handle missing values. Click OK.

4

In the output, read the Paired Samples Test table: Mean Difference, t-statistic, df, Sig. (2-tailed), and 95% CI of the difference.

5

For normality, run Analyze → Descriptive Statistics → Explore on your difference variable. Check the Shapiro-Wilk result — p > .05 confirms normality.

✓ SPSS generates three tables: Paired Samples Statistics, Paired Samples Correlations, and Paired Samples Test. Focus on the Paired Samples Test table for your inferential results.

R — Complete Code with Output

# Paired samples t-test in R # Create before and after vectors before <- c(148, 152, 144, 160, 155, 138, 163, 147, 151, 158) after <- c(136, 140, 130, 150, 143, 126, 148, 138, 140, 145) # Run the paired t-test result <- t.test(before, after, paired = TRUE, alternative = "two.sided") print(result) # Output: # Paired t-test # t = 13.416, df = 9, p-value = 1.07e-07 # 95% CI: [10.09, 13.91] # Mean difference: 12 # Calculate Cohen's d manually d_scores <- before - after cohens_d <- mean(d_scores) / sd(d_scores) cat("Cohen's d =", round(cohens_d, 3)) # Cohen's d = 4.243 # Check normality of differences shapiro.test(d_scores) # W = 0.967, p = 0.865 — normality confirmed

Python — scipy.stats

import numpy as np from scipy import stats # Define paired data before = np.array([148, 152, 144, 160, 155, 138, 163, 147, 151, 158]) after = np.array([136, 140, 130, 150, 143, 126, 148, 138, 140, 145]) # Paired t-test (ttest_rel = related samples) t_stat, p_value = stats.ttest_rel(before, after) print(f"t = {t_stat:.4f}, p = {p_value:.6f}") # t = 13.4164, p = 0.000000 # Degrees of freedom and effect size n = len(before) diff = before - after cohens_d = diff.mean() / diff.std(ddof=1) print(f"df = {n-1}, Cohen's d = {cohens_d:.3f}") # df = 9, Cohen's d = 4.243 # 95% confidence interval se = stats.sem(diff) ci = stats.t.interval(0.95, df=n-1, loc=diff.mean(), scale=se) print(f"95% CI: [{ci[0]:.2f}, {ci[1]:.2f}]") # 95% CI: [10.09, 13.91]

How to Report Paired t-test Results in APA Format

APA 7 requires five pieces of information: means and standard deviations for both conditions, the t-statistic with degrees of freedom in parentheses, the exact p-value, and Cohen's d. Here is a copy-paste template:

📝 APA 7 Report Template
A paired samples t-test revealed a statistically significant [increase/decrease] in [DV] from [Condition 1] (M = [X], SD = [X]) to [Condition 2] (M = [X], SD = [X]), t([df]) = [t-value], p = [p-value], d = [Cohen's d].

Worked APA Write-Up — Psychology Example

Social Support Intervention Study

A paired samples t-test revealed a statistically significant increase in perceived social support from pre-program (M = 32.83, SD = 7.91) to post-program (M = 38.07, SD = 7.23), t(19) = −3.23, p = .004, d = 0.73. This represents a medium-to-large effect, indicating the 8-week social skills program produced a practically meaningful improvement beyond statistical significance alone.

APA Reporting Checklist

Include: ① M and SD for both conditions ② t-statistic ③ df in parentheses, e.g. t(19) ④ exact p-value (not just p < .05) ⑤ Cohen's d ⑥ 95% confidence interval of the mean difference (increasingly required by journals).

Non-Parametric Alternative: Wilcoxon Signed-Rank Test

When assumption 4 (normality of differences) fails — and your sample is too small for the Central Limit Theorem to rescue you — the Wilcoxon signed-rank test is the appropriate substitute. It ranks the absolute differences and tests whether positive and negative differences are balanced, without assuming any particular distribution shape.

Switch to the Wilcoxon test when: your difference scores are clearly non-normal on a Shapiro-Wilk test (p < .05) and n < 30; your data is ordinal rather than continuous; or extreme outliers cannot be removed for substantive reasons. In R: wilcox.test(before, after, paired = TRUE). In Python: scipy.stats.wilcoxon(before, after).

The Wilcoxon test is somewhat less powerful than the paired t-test when normality holds, so do not default to it as a precaution — verify the assumption first. For a full overview of when to choose non-parametric methods, see hypothesis testing fundamentals.

Free Paired t-test Calculator

Enter your paired data below — one pair per line, comma-separated (before, after). The calculator computes the t-statistic, p-value, degrees of freedom, Cohen's d, and 95% confidence interval.

🧮 Paired Samples t-test Calculator

Enter pairs as before, after — one pair per line. Example: 148, 136

Formula & Entity Glossary

Symbol / EntityNameDefinition / Formula
tt-statistict = x̄_d / (s_d / √n) — the test statistic compared to the t-distribution
x̄_dMean of differencesAverage of all (x₁ᵢ − x₂ᵢ) difference scores across n pairs
s_dSD of differencesStandard deviation of the n difference scores d_i
SEStandard errorSE = s_d / √n — the precision of the mean difference estimate
dfDegrees of freedomdf = n − 1, where n is the number of pairs
H₀Null hypothesisμ_d = 0 — the population mean difference equals zero
H₁Alternative hypothesisμ_d ≠ 0 (two-tailed), μ_d < 0, or μ_d > 0 (one-tailed)
αSignificance levelThreshold for rejecting H₀, typically 0.05
dCohen's dd = x̄_d / s_d. Benchmarks: 0.2 small, 0.5 medium, 0.8 large (Cohen, 1988)
CIConfidence intervalx̄_d ± t*(α/2, n−1) × SE — range likely to contain the true μ_d
CLTCentral Limit TheoremFor n ≥ 30, the sampling distribution of x̄_d is approximately normal regardless of the underlying difference distribution
Shapiro-WilkNormality testTests H₀: difference scores are normally distributed. p > .05 supports normality assumption.

Frequently Asked Questions

A paired samples t-test is a parametric statistical test that compares the means of two related measurements from the same subjects or matched pairs. It tests whether the mean difference between those paired observations is statistically significantly different from zero. Other names for the same test include the dependent t-test, matched pairs t-test, and paired-difference t-test.
The formula is t = x̄_d / (s_d / √n), where x̄_d is the mean of the difference scores (before minus after for each pair), s_d is the standard deviation of those differences, and n is the number of pairs. The denominator is the standard error of the mean difference. Degrees of freedom = n − 1.
The four assumptions are: (1) The dependent variable must be continuous — measured on an interval or ratio scale. (2) The pairs must be randomly and independently sampled from the population. (3) There should be no significant outliers in the difference scores. (4) The differences should be approximately normally distributed — check with the Shapiro-Wilk test or a Q-Q plot, though this assumption relaxes for n ≥ 30 due to the Central Limit Theorem.
Use the paired t-test whenever the same subjects appear in both conditions — before-and-after designs, crossover trials, or matched-pair experiments. Use the independent samples t-test when the two groups consist of completely different, unrelated individuals. The paired test is more powerful because it removes between-subject variability from the error term.
Degrees of freedom = n − 1, where n is the number of pairs — not the total number of observations. If 20 participants each contribute one before-measurement and one after-measurement, then n = 20 and df = 19, not 39. Using the wrong n is the most common error students make on this test.
If p < your chosen α (typically 0.05), reject the null hypothesis — the mean difference is statistically significant. If p ≥ 0.05, you fail to reject H₀. Always report Cohen's d alongside the p-value; p-values alone say nothing about the magnitude of the difference.
The Wilcoxon signed-rank test is the non-parametric alternative. Use it when the difference scores are clearly non-normal (Shapiro-Wilk p < .05) and n < 30, when the data is ordinal, or when extreme outliers cannot be removed. In R: wilcox.test(before, after, paired = TRUE). In Python: scipy.stats.wilcoxon(before, after).
Report: means and SDs for both conditions, t-statistic, df in parentheses, exact p-value, and Cohen's d. Example: "A paired samples t-test revealed a statistically significant increase in scores from pre-test (M = 32.83, SD = 7.91) to post-test (M = 38.07, SD = 7.23), t(19) = −3.23, p = .004, d = 0.73."
Cohen's d is the standard effect size: d = x̄_d / s_d. Benchmarks from Cohen (1988): d = 0.2 is small, d = 0.5 is medium, d = 0.8 is large. For small samples (n < 50), use Hedges' g, which corrects for small-sample upward bias. SPSS 27+ outputs both automatically.
Yes, but you must verify the normality assumption (assumption 4) more carefully. Run a Shapiro-Wilk test on your difference scores — if p > .05, proceed with the paired t-test. If p < .05 and your differences are clearly non-normal, switch to the Wilcoxon signed-rank test. For n ≥ 30, the Central Limit Theorem makes this less of a concern.
📊

One-Sample t-test

Compare a sample mean to a known population value. The paired t-test is mathematically equivalent to a one-sample t-test on the difference scores.

⚖️

Independent Samples t-test

Compare means from two unrelated groups. Choose this when subjects differ between conditions.

🔬

Hypothesis Testing Guide

The framework behind all t-tests — null hypotheses, p-values, Type I/II errors, and power.

📈

Normal Distribution

The theoretical basis for the t-test's normality assumption and the shape of the t-distribution.

🎲

Confidence Intervals

The 95% CI around the mean difference is a key component of complete paired t-test reporting.

📉

Sampling Distributions

Why the Central Limit Theorem makes the paired t-test robust to non-normality at n ≥ 30.

External references: Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Lawrence Erlbaum. | Lakens, D. (2013). Calculating and reporting effect sizes. Frontiers in Psychology | scipy.stats.ttest_rel documentation | R t.test() documentation