Hypothesis Testing T-Tests Statistical Assumptions 22 min read June 9, 2026
BY: Statistics Fundamentals Team
Reviewed By: Minsa A (Senior Statistics Editor)

Equal vs Unequal Variance: Complete Guide

Before running a two-sample t-test, you face one decision that most students skip: do the two groups have the same spread? That single question — equal or unequal variance — determines which test formula is valid and how far to trust your p-value. Get it wrong and your conclusions can mislead.

This guide builds the concept from the ground up. It covers what equal and unequal variance mean, why they matter for inference, how to check them with Levene's and Bartlett's tests, and when to choose Welch's t-test over the pooled version — with full worked examples and an interactive decision tool.

What You'll Learn
  • ✓ The exact definitions of homoscedasticity and heteroscedasticity
  • ✓ Why variance assumptions change p-values, confidence intervals, and error rates
  • ✓ Step-by-step: visual inspection, Levene's test, and Bartlett's test
  • ✓ Pooled t-test vs Welch's t-test — formulas, differences, when each applies
  • ✓ Fully worked numerical examples for both equal and unequal variance cases
  • ✓ A/B testing and real-world applications across medicine, psychology, and data science
  • ✓ Interactive variance decision tool and entity glossary table

Equal vs Unequal Variance (Overview)

Core Distinction — Equal vs Unequal Variance
Equal variance (homoscedasticity) means two groups share the same population spread: σ₁² = σ₂². Unequal variance (heteroscedasticity) means their spreads differ: σ₁² ≠ σ₂². The choice between pooled t-test and Welch's t-test depends entirely on which condition holds in your data.
H₀ (equal variance): σ₁² = σ₂²  |  Hₐ (unequal): σ₁² ≠ σ₂²

Every two-sample hypothesis test rests on assumptions about the populations being compared. The variance assumption is the one that trips up practitioners most often, because it is invisible until you check for it — and the cost of ignoring it can be a badly miscalibrated p-value.

The concept maps onto a simple intuition: if one group's measurements are tightly clustered and another group's are wildly spread out, pooling those two spreads into a single estimate distorts the test. Welch's correction keeps them separate. This guide shows you exactly when and how to apply each approach.

⚡ Quick Reference — Equal vs Unequal Variance Key Facts
  • Equal variance: Both groups have the same σ² — use the pooled two-sample t-test
  • Unequal variance: Groups have different σ² — use Welch's t-test
  • Default recommendation: When in doubt, always use Welch's t-test (it is robust to both scenarios)
  • Testing for equality: Levene's test (robust default) or Bartlett's test (for normal data)
  • Where it matters: Two-sample t-tests, ANOVA, regression residuals, A/B testing
  • Key terms: Homoscedasticity = equal, Heteroscedasticity = unequal
σ₁² = σ₂²
Equal Variance Condition
σ₁² ≠ σ₂²
Unequal Variance Condition
0.05
Levene's Test Threshold
Welch
Default Safe Choice

What Is Equal Variance? (Homoscedasticity)

Equal variance — formally called homoscedasticity — describes the condition where every group in your analysis has the same population variance. In a two-sample setting, this means the scatter around each group's mean is statistically indistinguishable: Group A and Group B both exhibit the same degree of spread.

Equal Variance Condition (Homoscedasticity)
σ₁² = σ₂²
σ₁² = population variance of Group 1 σ₂² = population variance of Group 2 = means spreads are equal

When this condition holds, you gain something valuable: both samples can be used together to estimate a single, shared population variance. The statistical name for that combined estimate is the pooled variance. Because the pooled estimate draws on all observations from both groups, it is more precise (lower standard error) than estimating each group's variance separately — provided the equality assumption is correct.

The Pooled Variance Formula

The pooled sample variance combines both groups while weighting each by its degrees of freedom:

Pooled Variance (Equal Variance Assumption)
s²_p = [(n₁ − 1)s₁² + (n₂ − 1)s₂²] / (n₁ + n₂ − 2)
s₁², s₂² = sample variances for each group n₁, n₂ = sample sizes n₁+n₂−2 = total degrees of freedom

The pooled variance feeds directly into the standard error for the pooled t-test. Because the estimate uses all n₁ + n₂ − 2 degrees of freedom, it is more stable than two separate estimates — but only when σ₁² = σ₂² is a reasonable assumption. If the groups actually differ in variance and you pool anyway, your standard error is biased and your t-statistic misfires.

When Does Equal Variance Actually Hold?

Equal variance is most defensible in controlled experimental settings. Randomized controlled trials often produce groups with similar spreads, because randomisation distributes sources of variability evenly. Tightly designed laboratory experiments, quality control settings where a single process generates both groups, and educational studies comparing students from the same curriculum are classic examples where homoscedasticity is plausible.

📐
Remember: Equal Variance Is an Assumption, Not a Given

The pooled t-test does not verify that variances are equal — it assumes they are. You must check this assumption with a formal test (Levene's or Bartlett's) before using the pooled method. Skipping the check is one of the most common errors in applied statistics.

✓ Equal Variance (Homoscedasticity)
Both groups have the same width (spread). Safe to pool variances.
✗ Unequal Variance (Heteroscedasticity)
One group is tight, the other is wide. Pooling variances would distort the test.

What Is Unequal Variance? (Heteroscedasticity)

Unequal variance — formally called heteroscedasticity — occurs when the population variances of two groups differ. One group's data points cluster tightly around their mean; the other group's scatter is wider. These two spreads cannot be meaningfully averaged into a single pooled estimate without biasing the result.

Heteroscedasticity is not unusual. It is the default state in many real-world datasets. Groups formed by different demographic segments, patient populations receiving different treatments, or conversion rate data in A/B tests frequently exhibit unequal spreads. Assuming equality where none exists systematically inflates or deflates your t-statistic.

Why Heteroscedasticity Causes Problems

When you pool two unequal variances, the result is neither σ₁² nor σ₂² — it is a weighted average that misrepresents both. The standard error used in the t-test formula is then wrong, which cascades into a biased t-statistic and a p-value that does not accurately reflect the probability of observing your data under the null hypothesis.

⚠️
The Cost of Ignoring Unequal Variance

Simulations show that when the variance ratio between groups is 4:1 or greater and sample sizes differ, the true Type I error rate with a pooled t-test can reach 0.10–0.20, even when the nominal α is 0.05. That means twice to four times as many false positives as you intended.

Heteroscedasticity in Real-World Data

Several mechanisms produce unequal variances in practice. When one group has a wider range of true scores — for example, a control group that includes both high- and low-responders while a treatment group converges on a consistent response — the variances will differ. Income data, where one group includes extreme outliers and another does not, is a textbook case. So is comparing standardised test scores across schools with very different resources: the variance within high-resource schools may be tight while the variance within lower-resource schools is wide.

Real-World Example

Medical Trials: Treatment vs Control Variance

In a drug trial, the control group's blood pressure measurements may span a wide range (high between-patient variability) while the treatment group converges on a narrower band as the drug brings outliers toward the target range. The two variances differ structurally — not randomly. Running a pooled t-test here gives an artificially narrow standard error, inflating the t-statistic and leading to overconfident conclusions about the drug's effect.

Why Variance Assumptions Matter in Hypothesis Testing

The variance assumption is not a technicality — it is embedded in the denominator of the t-test formula. The standard error of the difference between means is what the test uses to decide whether an observed difference is large relative to sampling variability. Every term in that standard error depends on the variance structure of your data.

Impact on p-Values

A t-statistic equals the observed difference divided by the standard error. If the standard error is too small (from incorrect pooling), the t-statistic inflates and the p-value shrinks below where it should be. The data appear to support rejection of the null hypothesis with more confidence than is warranted. This is false precision — a direct consequence of a violated assumption.

Impact on Confidence Intervals

The same standard error that distorts the p-value also distorts the confidence interval. A confidence interval built on an underestimated standard error is too narrow: it excludes the true parameter value more often than the stated confidence level implies. If you report a 95% CI that is actually 90%, your inferential claims are systematically overstated.

Impact on Type I and Type II Errors

Type I error (false positive) rates rise when equal variance is assumed but not present, especially when the group with larger variance also has the smaller sample size. Type II error (false negative) is the mirror problem: if the smaller sample belongs to the group with greater spread, the pooled standard error overshoots and the test loses statistical power, missing real effects.

Equal Variance — Correct Use

Pooled T-Test

SE = s_p × √(1/n₁ + 1/n₂)

Uses combined variance estimate. More efficient when σ₁² = σ₂². All n₁ + n₂ − 2 degrees of freedom available.

Unequal Variance — Required

Welch's T-Test

SE = √(s₁²/n₁ + s₂²/n₂)

Keeps variances separate. Adjusts degrees of freedom downward via Welch–Satterthwaite equation. More conservative, controls α accurately.

When Uncertain

Default to Welch

Always safe

If you are unsure whether variances are equal, Welch's t-test is the conservative and widely recommended default. Loss of power is minimal when variances happen to be equal.

Equal vs Unequal Variance: Key Differences

Feature Equal Variance (Homoscedastic) Unequal Variance (Heteroscedastic)
Formal Condition σ₁² = σ₂² σ₁² ≠ σ₂²
Group Spread Same variability across groups Different variability across groups
Variance Estimate Pooled (s²_p), shared across groups Separate for each group
Recommended Test Pooled two-sample t-test Welch's t-test
Degrees of Freedom n₁ + n₂ − 2 (maximum) Welch–Satterthwaite approximation (reduced)
Statistical Power Higher when assumption holds Welch loses little power under equality
Real-World Frequency Common in controlled experiments Common in observational/business data
Effect on α if Misused Pooled test fine Pooled test inflates α; use Welch
ANOVA Assumption Required for standard one-way ANOVA Welch's ANOVA or transformation required
Regression Term Homoscedastic residuals Heteroscedastic residuals — use robust SE
The Safe Rule of Thumb

Modern statistical guidance — including the American Statistical Association and most regression textbooks — recommends using Welch's t-test as the default for two-sample comparisons. The cost of using Welch when variances are equal is small (slightly wider confidence intervals); the cost of using the pooled test when variances are unequal can be a badly distorted p-value. Default to Welch.

How to Check for Equal vs Unequal Variance (Step-by-Step)

Assessing variance equality involves three layers: informal visual inspection, a formal statistical test, and a decision rule. Work through them in order before choosing your t-test variant.

Step 1 — Visual Inspection

Before running any formal test, plot your data. Side-by-side boxplots are the fastest diagnostic: if the boxes and whiskers are roughly the same length for both groups, equal variance is plausible. If one box is noticeably taller or the whiskers extend much further, heteroscedasticity is likely. Histograms and dotplots reinforce this picture by showing the raw distribution of each group's values.

Visual inspection is not sufficient on its own — it will miss subtle differences and can be misleading with small samples — but it sets expectations before you interpret formal test results.

Step 2 — Run Levene's Test

Levene's test is the standard default for testing variance equality. It works by converting each observation to its distance from the group mean (using absolute deviations), then testing whether those distances differ across groups with a standard F-test. Because it uses absolute values rather than squared values, it is robust to non-normal distributions.

Levene's Test Statistic
W = [(N − k) / (k − 1)] × [Σᵢ nᵢ(Z̄ᵢ − Z̄)²] / [Σᵢ Σⱼ (Zᵢⱼ − Z̄ᵢ)²]
Zᵢⱼ = |Yᵢⱼ − Ȳᵢ| absolute deviation from group mean N = total observations k = number of groups W ~ F(k−1, N−k)

Interpret Levene's test output with the standard threshold: if the p-value exceeds 0.05, you do not reject the assumption of equal variance and the pooled t-test is appropriate. If the p-value falls below 0.05, variances differ significantly and you should use Welch's t-test.

Step 3 — Consider Bartlett's Test (for Normal Data)

Bartlett's test is more powerful than Levene's when data are truly normally distributed, but it is sensitive to departures from normality. A single non-normal group can cause Bartlett's to flag inequality even when variances are actually equal. Use Bartlett's only when you have strong evidence of normality in both groups; otherwise default to Levene's.

Step 4 — Apply the Decision Rule

Decision Rule Summary

Choosing the Right Test

Levene's p > 0.05: Do not reject equal variance → use pooled t-test.
Levene's p < 0.05: Reject equal variance → use Welch's t-test.
Unsure or no test run: Default to Welch's t-test — it is safe in both scenarios.

Welch's T-Test vs Pooled T-Test (Deep Comparison)

Both tests compare the means of two independent groups, but they handle the variance structure differently. Understanding the mechanism of each helps you interpret software output and defend your choice to reviewers.

The Pooled T-Test Formula

Pooled Two-Sample T-Test (Equal Variance)
t = (x̄₁ − x̄₂) / [s_p × √(1/n₁ + 1/n₂)]
s_p = pooled standard deviation df = n₁ + n₂ − 2

The pooled standard deviation s_p is the square root of the pooled variance. Because it uses all degrees of freedom, the t-distribution reference is richer and the test is more powerful — when the equal variance assumption holds.

Welch's T-Test Formula

Welch's T-Test (Unequal Variance)
t = (x̄₁ − x̄₂) / √(s₁²/n₁ + s₂²/n₂)
s₁², s₂² = separate sample variances df = Welch–Satterthwaite approximation

Welch's t-statistic has the same numerator but a different denominator. The standard error is computed from each group's variance independently, and the degrees of freedom are approximated by the Welch–Satterthwaite equation, which always produces a value smaller than the pooled df. This reduction in df widens the t-distribution, making the critical value more conservative and correctly accounting for additional uncertainty from estimating two separate variances.

Welch–Satterthwaite Degrees of Freedom Approximation
df ≈ (s₁²/n₁ + s₂²/n₂)² / [(s₁²/n₁)²/(n₁−1) + (s₂²/n₂)²/(n₂−1)]
df is always ≤ n₁ + n₂ − 2 More unequal variances → smaller df → more conservative test

The practical difference between the two methods is usually small when sample sizes are large and variances are only moderately unequal. It becomes consequential when sample sizes differ and variance ratios exceed 4:1 — exactly the scenario where choosing incorrectly matters most.

Worked Examples

Example 1 — Equal Variance: Pooled T-Test

Worked Example — Equal Variance

Comparing Exam Scores: Two Teaching Methods

A researcher compares exam scores for two groups of students. Group A (traditional lecture): n₁ = 12, x̄₁ = 74, s₁ = 8.2. Group B (flipped classroom): n₂ = 12, x̄₂ = 79, s₂ = 7.9. Levene's test returns p = 0.81. Test whether the means differ at α = 0.05.

1

Check variance assumption: Levene's p = 0.81 > 0.05 → do not reject equal variance. Proceed with pooled t-test.

2

Calculate pooled variance: s²_p = [(12−1)(8.2²) + (12−1)(7.9²)] / (12+12−2) = [(11)(67.24) + (11)(62.41)] / 22 = [739.64 + 686.51] / 22 = 1426.15 / 22 = 64.82

3

Calculate pooled standard deviation and standard error: s_p = √64.82 = 8.05 | SE = 8.05 × √(1/12 + 1/12) = 8.05 × √(0.1667) = 8.05 × 0.4082 = 3.29

4

Compute t-statistic: t = (74 − 79) / 3.29 = −5 / 3.29 = −1.52. Degrees of freedom: df = 12 + 12 − 2 = 22.

5

Find critical value and decide: Critical t (two-tailed, df = 22, α = 0.05) ≈ ±2.074. |t| = 1.52 < 2.074. Also p ≈ 0.143 > 0.05.

✓ Conclusion: Fail to reject H₀. There is not enough evidence at α = 0.05 to conclude the two teaching methods differ in mean exam score. The 5-point observed difference is within the range expected from sampling variability alone.

Example 2 — Unequal Variance: Welch's T-Test

Worked Example — Unequal Variance

Comparing Response Times: Two App Versions

A product team compares user response times (seconds) for App v1 and App v2. App v1: n₁ = 20, x̄₁ = 3.8 s, s₁ = 0.6. App v2: n₂ = 15, x̄₂ = 3.1 s, s₂ = 1.9. Levene's test returns p = 0.003. Test at α = 0.05.

1

Check variance assumption: Levene's p = 0.003 < 0.05 → reject equal variance. Use Welch's t-test. Note: s₂/s₁ ≈ 3.2 — a large ratio consistent with the test result.

2

Compute Welch standard error: SE = √(s₁²/n₁ + s₂²/n₂) = √(0.36/20 + 3.61/15) = √(0.018 + 0.2407) = √0.2587 = 0.5086

3

Compute t-statistic: t = (3.8 − 3.1) / 0.5086 = 0.7 / 0.5086 = 1.376

4

Welch–Satterthwaite df: df ≈ (0.018 + 0.2407)² / [(0.018²/19) + (0.2407²/14)] ≈ (0.2587)² / [(0.0000171) + (0.004136)] ≈ 0.06693 / 0.004153 ≈ 16.1 → round down to 16.

5

Decision: Critical t (two-tailed, df = 16, α = 0.05) ≈ ±2.120. |t| = 1.376 < 2.120. p ≈ 0.187 > 0.05.

✓ Conclusion: Fail to reject H₀ using Welch's test. Despite an apparent 0.7-second difference, the unequal variance (and resulting adjusted df = 16 rather than 33) means there is insufficient evidence to conclude the app versions differ significantly in response time at α = 0.05.

Note: If we had incorrectly used the pooled t-test here, df = 33 and the critical value would be ≈ 2.035, making the test more liberal. The pooled result might have appeared closer to significance — illustrating how ignoring unequal variance can distort your conclusion.

Example 3 — A/B Testing Scenario

Case Study

Conversion Rate Variance in E-Commerce A/B Testing

An e-commerce team runs an A/B test on two checkout page designs. Control (n=500): mean conversion 4.2%, s = 0.8%. Variant (n=500): mean conversion 5.1%, s = 2.3%.

The large difference in standard deviations (2.3% vs 0.8%) flags potential heteroscedasticity. Levene's test confirms it: p = 0.001. Welch's t-test is required. Using the pooled test here would understate the standard error and report a p-value that exaggerates confidence in the variant's superiority.

This situation is common in conversion data: the control group's behavior is stable and predictable, while a new variant creates two subpopulations — users who strongly respond and users who ignore the change. The result is higher variance in the variant, not lower performance. Welch's test correctly accounts for this structure.

Testing Methods: Levene's Test and Bartlett's Test

Two formal tests dominate variance equality assessment. Each has a different sensitivity profile, and knowing when to prefer one over the other prevents you from getting misleading results.

Levene's Test

Levene's test converts raw observations to their absolute deviations from the group mean, then applies a standard one-way ANOVA to those deviations. Because the transformation removes the distribution's shape from the test, Levene's remains valid under non-normal data — the common case in practice. It is the default in most statistical packages (SPSS uses Levene's in the Independent Samples T-Test output; R's var.test uses F-test, while leveneTest from the car package gives Levene's directly).

🔬
Levene's Test in R and Python

R: car::leveneTest(outcome ~ group, data=df). Python (SciPy): scipy.stats.levene(group1, group2). Both return an F-statistic and p-value. If p < 0.05, switch to Welch's t-test.

Bartlett's Test

Bartlett's test uses a chi-squared statistic based on the likelihood ratio of variances. It is more powerful than Levene's when data are genuinely normal, extracting more information from the distribution shape. However, that sensitivity to shape cuts both ways: even mild skewness or a few outliers can cause Bartlett's to reject equal variance when the variances are in fact equal. Reserve Bartlett's for situations where you have strong independent evidence of normality — for example, after a Q-Q plot confirms the data follow a normal distribution closely.

F-Test of Variance Ratio

The simplest variance test computes F = s₁²/s₂² and compares it to the F-distribution with (n₁ − 1, n₂ − 1) degrees of freedom. A ratio far from 1 (high or low) flags inequality. Like Bartlett's, the F-test is sensitive to non-normality. It is most useful as a quick sanity check or for understanding the magnitude of any variance ratio, not as a standalone decision tool.

Test Statistic Sensitivity to Non-Normality Best Used When
Levene's Test F (on absolute deviations) Low — robust default Most practical situations; default choice
Bartlett's Test χ² approximation High — sensitive to outliers Confirmed normal distributions only
F-Test (ratio) F = s₁²/s₂² High — assumes normality Quick check; two-group normal data
Fligner–Killeen χ² on ranks Very low — non-parametric Heavily skewed or ordinal data

Interactive Variance Decision Tool

Variance Assumption Decision Tool

Enter your sample statistics to get an instant recommendation.

Variance Assumptions Beyond the T-Test

Equal variance is not just a t-test concern. It appears as a core assumption in several widely used methods across statistics, and the consequences of violating it vary by context.

One-Way ANOVA

Standard ANOVA assumes that all k groups have the same population variance. When this assumption is violated, the F-statistic is biased. The solution depends on the severity: mild heteroscedasticity may be tolerable with balanced designs, but large imbalances in both sample sizes and variances require Welch's one-way ANOVA (oneway.test in R with var.equal = FALSE) or a variance-stabilising transformation such as a log or square-root transform.

Simple and Multiple Linear Regression

In simple linear regression and multiple linear regression, homoscedasticity means that the variance of the residuals is constant across all fitted values. When residuals fan out (or fan in) as X increases, heteroscedasticity is present. The standard errors of the regression coefficients are biased, making hypothesis tests and confidence intervals unreliable. Remedies include using heteroscedasticity-consistent (HC) robust standard errors (White's correction), transforming the outcome variable, or modelling the variance structure with weighted least squares.

Regression Diagnostic: Residual Plot

The residuals-versus-fitted plot is the primary diagnostic for heteroscedasticity in regression. A random horizontal scatter of residuals across all fitted values is consistent with homoscedasticity. A funnel shape — where residuals spread out as fitted values increase — is the classic heteroscedastic signature. Scale-location plots (square root of standardised residuals against fitted values) and the Breusch–Pagan test provide more formal diagnostics.

Heteroscedasticity in Logistic Regression

Logistic regression models a binary outcome and does not assume homoscedasticity of residuals in the same sense. The variance of a Bernoulli outcome is p(1−p), which varies by design. However, overdispersion — where residual variance exceeds what the model predicts — is an analogous problem addressed through quasi-binomial models or mixed effects models.

Real-World Applications of Variance Assumptions

The equal vs unequal variance question appears across every discipline that compares groups. Here are the most common application contexts.

Medical Research

Clinical Trials

Treatment vs control groups often develop different variances as a drug reduces both mean response and its variability. Welch's t-test is the default for primary endpoints. Protocol pre-specification should state which test will be used.

Psychology

Experimental Studies

Two experimental conditions may trigger different levels of between-subject variability. Standard practice is to test homogeneity of variance before any independent-samples analysis, reporting Levene's F and p alongside the t-test output.

Education Research

Achievement Comparisons

Comparing test scores across schools, teaching methods, or demographic groups routinely produces heteroscedastic data. Welch's correction is standard in educational measurement journals.

Data Science / Tech

A/B Testing

Conversion rates, session durations, and revenue per user frequently violate equal variance assumptions. Many tech companies default to Welch's t-test or bootstrap confidence intervals for online experiments.

Manufacturing

Quality Control (SPC)

When two production lines are compared, unequal process variances can mask or exaggerate mean differences. Formal variance tests precede any capability study or two-sample comparison in ISO-aligned QC workflows.

Common Mistakes and Misconceptions

Mistake Wrong Approach Correct Approach
Skipping variance check entirely Always use pooled t-test without checking Run Levene's test first; default to Welch if unsure
Equating similar sample sizes with equal variance n₁ ≈ n₂ means variances are probably equal Sample sizes and variances are independent properties — always test
Using Bartlett's test on non-normal data Apply Bartlett's regardless of distribution Use Levene's as default; only Bartlett's when normality is confirmed
Thinking Welch's always gives a different result Avoid Welch's because it changes the conclusion Welch's and pooled often agree when variances are similar; use Welch's anyway as the safe default
Assuming equal variance in regression residuals Trust standard errors without checking residual plots Plot residuals vs fitted values; use robust standard errors if heteroscedasticity is detected
Using the F-test ratio as the only variance check F = s₁²/s₂², that's the variance test done F-test is sensitive to non-normality; Levene's is more appropriate for general use

Entity and Formula Glossary

Term Formula / Symbol Definition Test Context
Equal variance σ₁² = σ₂² Same spread across groups Supports pooled t-test
Unequal variance σ₁² ≠ σ₂² Different group spreads Use Welch's t-test
Homoscedasticity Var(ε) = constant Constant residual variance across fitted values ANOVA / regression assumption
Heteroscedasticity Var(ε) not constant Residual variance changes with X or group Use robust SE or transform
Variance σ² = Σ(x−μ)²/N Average squared deviation from the mean Core measure of spread
Standard deviation σ = √σ² Square root of variance — same units as data Reported alongside mean
Pooled variance s²_p = [(n₁−1)s₁²+(n₂−1)s₂²]/(n₁+n₂−2) Weighted mean of two sample variances Used in pooled t-test
Welch's t-test t = (x̄₁−x̄₂)/√(s₁²/n₁+s₂²/n₂) T-test that does not assume equal variance Unequal or unknown variance
Levene's test W ~ F(k−1, N−k) Tests equality of variances using absolute deviations Robust default for any distribution
Bartlett's test χ² approximation Tests variance equality under normality More powerful but normality-sensitive
F-test (ratio) F = s₁²/s₂² Direct ratio of sample variances Quick check, sensitive to non-normality
Welch–Satterthwaite df See formula above Approximate degrees of freedom for Welch's test Always ≤ pooled df
Independent t-test t = (x̄₁−x̄₂)/SE Two-sample test for independent groups Pooled or Welch variant
Type I error (α) P(reject H₀ | H₀ true) False positive rate — inflated by violated variance assumption Target: 0.05

Frequently Asked Questions

What is the difference between equal and unequal variance t-test?

The pooled (equal variance) t-test combines both groups' sample variances into a single pooled estimate and uses n₁ + n₂ − 2 degrees of freedom. Welch's (unequal variance) t-test keeps each group's variance separate and calculates an adjusted, smaller degrees of freedom using the Welch–Satterthwaite approximation. The pooled test is more powerful when variances are truly equal; Welch's test controls Type I error correctly when they are not.

Is Welch's t-test always better?

Not strictly "better" in every scenario, but it is safer as a default. When variances are equal, the pooled test has a slight power advantage because it uses more degrees of freedom. When variances are unequal, Welch's test is unambiguously superior because it controls the actual error rate. Since the power cost of using Welch's when variances are equal is small, most methodologists now recommend it as the universal default.

What is homoscedasticity?

Homoscedasticity means constant variance. In a two-group context, it means σ₁² = σ₂². In regression, it means the variance of the residuals does not change systematically with the fitted values or any predictor. It is a core assumption of ordinary least squares regression, the pooled t-test, and standard ANOVA.

How do you know if variances are unequal?

There are three complementary approaches: (1) visually, using side-by-side boxplots or overlaid histograms — if one group's spread is noticeably wider, variances likely differ; (2) with Levene's test — if p < 0.05, reject equal variance; (3) with a variance ratio F = s₁²/s₂² — ratios above 4 or below 0.25 are a practical flag, though formal tests are more reliable.

Does sample size affect which test to use?

Sample size interacts with variance inequality in two ways. Large, balanced samples (n₁ ≈ n₂) make the pooled t-test relatively robust to modest heteroscedasticity. Small or unequal sample sizes amplify the impact of unequal variances on Type I error. The most dangerous scenario is a small sample in the high-variance group and a large sample in the low-variance group — this combination badly inflates false positive rates with the pooled test.

Can Levene's test itself be wrong?

Yes. Like any hypothesis test, Levene's test has Type I error (falsely concluding variances differ when they do not, in 5% of cases at α = 0.05) and Type II error (missing real variance differences, especially with small samples). With very small samples (<10 per group), Levene's test lacks power and may miss real heteroscedasticity. In that case, a conservative default to Welch's t-test is wise regardless of Levene's p-value.

Summary: Equal vs Unequal Variance

The distinction between equal and unequal variance determines which two-sample t-test formula gives trustworthy results. Homoscedasticity (σ₁² = σ₂²) allows pooling, which is more efficient. Heteroscedasticity (σ₁² ≠ σ₂²) requires Welch's correction, which adjusts both the standard error and degrees of freedom to prevent inflated Type I errors.

The practical workflow is straightforward: inspect data visually with boxplots, confirm with Levene's test, and choose accordingly — or simply default to Welch's t-test throughout. The cost of Welch's when variances happen to be equal is negligible. The cost of the pooled test when variances are unequal can be substantial.

Beyond t-tests, the same logic extends to ANOVA (where Welch's one-way ANOVA replaces standard ANOVA under heteroscedasticity) and linear regression (where heteroscedastic residuals call for robust standard errors or transformation).

Key Takeaway

Equal variance means same spread across groups (σ₁² = σ₂²) — use the pooled t-test. Unequal variance means different spreads (σ₁² ≠ σ₂²) — use Welch's t-test. When unsure, always use Welch's. Check with Levene's test before deciding. This single habit eliminates one of the most common sources of inflated false positive rates in applied statistics.

Key references: Welch, B.L. (1947). The generalization of "Student's" problem when several different population variances are involved. Biometrika, 34(1–2), 28–35. Levene, H. (1960). Robust tests for equality of variances. In Contributions to Probability and Statistics. Ruxton, G.D. (2006). The unequal variance t-test is an underused alternative to Student's t-test and the Mann-Whitney U test. Behavioral Ecology, 17(4), 688–690. See also: statsmodels documentation and the R car package for implementation details.