Equal vs Unequal Variance (Overview)
Every two-sample hypothesis test rests on assumptions about the populations being compared. The variance assumption is the one that trips up practitioners most often, because it is invisible until you check for it — and the cost of ignoring it can be a badly miscalibrated p-value.
The concept maps onto a simple intuition: if one group's measurements are tightly clustered and another group's are wildly spread out, pooling those two spreads into a single estimate distorts the test. Welch's correction keeps them separate. This guide shows you exactly when and how to apply each approach.
- Equal variance: Both groups have the same σ² — use the pooled two-sample t-test
- Unequal variance: Groups have different σ² — use Welch's t-test
- Default recommendation: When in doubt, always use Welch's t-test (it is robust to both scenarios)
- Testing for equality: Levene's test (robust default) or Bartlett's test (for normal data)
- Where it matters: Two-sample t-tests, ANOVA, regression residuals, A/B testing
- Key terms: Homoscedasticity = equal, Heteroscedasticity = unequal
What Is Equal Variance? (Homoscedasticity)
Equal variance — formally called homoscedasticity — describes the condition where every group in your analysis has the same population variance. In a two-sample setting, this means the scatter around each group's mean is statistically indistinguishable: Group A and Group B both exhibit the same degree of spread.
σ₁² = population variance of Group 1
σ₂² = population variance of Group 2
= means spreads are equal
When this condition holds, you gain something valuable: both samples can be used together to estimate a single, shared population variance. The statistical name for that combined estimate is the pooled variance. Because the pooled estimate draws on all observations from both groups, it is more precise (lower standard error) than estimating each group's variance separately — provided the equality assumption is correct.
The Pooled Variance Formula
The pooled sample variance combines both groups while weighting each by its degrees of freedom:
s₁², s₂² = sample variances for each group
n₁, n₂ = sample sizes
n₁+n₂−2 = total degrees of freedom
The pooled variance feeds directly into the standard error for the pooled t-test. Because the estimate uses all n₁ + n₂ − 2 degrees of freedom, it is more stable than two separate estimates — but only when σ₁² = σ₂² is a reasonable assumption. If the groups actually differ in variance and you pool anyway, your standard error is biased and your t-statistic misfires.
When Does Equal Variance Actually Hold?
Equal variance is most defensible in controlled experimental settings. Randomized controlled trials often produce groups with similar spreads, because randomisation distributes sources of variability evenly. Tightly designed laboratory experiments, quality control settings where a single process generates both groups, and educational studies comparing students from the same curriculum are classic examples where homoscedasticity is plausible.
The pooled t-test does not verify that variances are equal — it assumes they are. You must check this assumption with a formal test (Levene's or Bartlett's) before using the pooled method. Skipping the check is one of the most common errors in applied statistics.
✓ Equal Variance (Homoscedasticity)
✗ Unequal Variance (Heteroscedasticity)
What Is Unequal Variance? (Heteroscedasticity)
Unequal variance — formally called heteroscedasticity — occurs when the population variances of two groups differ. One group's data points cluster tightly around their mean; the other group's scatter is wider. These two spreads cannot be meaningfully averaged into a single pooled estimate without biasing the result.
Heteroscedasticity is not unusual. It is the default state in many real-world datasets. Groups formed by different demographic segments, patient populations receiving different treatments, or conversion rate data in A/B tests frequently exhibit unequal spreads. Assuming equality where none exists systematically inflates or deflates your t-statistic.
Why Heteroscedasticity Causes Problems
When you pool two unequal variances, the result is neither σ₁² nor σ₂² — it is a weighted average that misrepresents both. The standard error used in the t-test formula is then wrong, which cascades into a biased t-statistic and a p-value that does not accurately reflect the probability of observing your data under the null hypothesis.
Simulations show that when the variance ratio between groups is 4:1 or greater and sample sizes differ, the true Type I error rate with a pooled t-test can reach 0.10–0.20, even when the nominal α is 0.05. That means twice to four times as many false positives as you intended.
Heteroscedasticity in Real-World Data
Several mechanisms produce unequal variances in practice. When one group has a wider range of true scores — for example, a control group that includes both high- and low-responders while a treatment group converges on a consistent response — the variances will differ. Income data, where one group includes extreme outliers and another does not, is a textbook case. So is comparing standardised test scores across schools with very different resources: the variance within high-resource schools may be tight while the variance within lower-resource schools is wide.
Medical Trials: Treatment vs Control Variance
In a drug trial, the control group's blood pressure measurements may span a wide range (high between-patient variability) while the treatment group converges on a narrower band as the drug brings outliers toward the target range. The two variances differ structurally — not randomly. Running a pooled t-test here gives an artificially narrow standard error, inflating the t-statistic and leading to overconfident conclusions about the drug's effect.
Why Variance Assumptions Matter in Hypothesis Testing
The variance assumption is not a technicality — it is embedded in the denominator of the t-test formula. The standard error of the difference between means is what the test uses to decide whether an observed difference is large relative to sampling variability. Every term in that standard error depends on the variance structure of your data.
Impact on p-Values
A t-statistic equals the observed difference divided by the standard error. If the standard error is too small (from incorrect pooling), the t-statistic inflates and the p-value shrinks below where it should be. The data appear to support rejection of the null hypothesis with more confidence than is warranted. This is false precision — a direct consequence of a violated assumption.
Impact on Confidence Intervals
The same standard error that distorts the p-value also distorts the confidence interval. A confidence interval built on an underestimated standard error is too narrow: it excludes the true parameter value more often than the stated confidence level implies. If you report a 95% CI that is actually 90%, your inferential claims are systematically overstated.
Impact on Type I and Type II Errors
Type I error (false positive) rates rise when equal variance is assumed but not present, especially when the group with larger variance also has the smaller sample size. Type II error (false negative) is the mirror problem: if the smaller sample belongs to the group with greater spread, the pooled standard error overshoots and the test loses statistical power, missing real effects.
Pooled T-Test
Uses combined variance estimate. More efficient when σ₁² = σ₂². All n₁ + n₂ − 2 degrees of freedom available.
Welch's T-Test
Keeps variances separate. Adjusts degrees of freedom downward via Welch–Satterthwaite equation. More conservative, controls α accurately.
Default to Welch
If you are unsure whether variances are equal, Welch's t-test is the conservative and widely recommended default. Loss of power is minimal when variances happen to be equal.
Equal vs Unequal Variance: Key Differences
| Feature | Equal Variance (Homoscedastic) | Unequal Variance (Heteroscedastic) |
|---|---|---|
| Formal Condition | σ₁² = σ₂² | σ₁² ≠ σ₂² |
| Group Spread | Same variability across groups | Different variability across groups |
| Variance Estimate | Pooled (s²_p), shared across groups | Separate for each group |
| Recommended Test | Pooled two-sample t-test | Welch's t-test |
| Degrees of Freedom | n₁ + n₂ − 2 (maximum) | Welch–Satterthwaite approximation (reduced) |
| Statistical Power | Higher when assumption holds | Welch loses little power under equality |
| Real-World Frequency | Common in controlled experiments | Common in observational/business data |
| Effect on α if Misused | Pooled test fine | Pooled test inflates α; use Welch |
| ANOVA Assumption | Required for standard one-way ANOVA | Welch's ANOVA or transformation required |
| Regression Term | Homoscedastic residuals | Heteroscedastic residuals — use robust SE |
Modern statistical guidance — including the American Statistical Association and most regression textbooks — recommends using Welch's t-test as the default for two-sample comparisons. The cost of using Welch when variances are equal is small (slightly wider confidence intervals); the cost of using the pooled test when variances are unequal can be a badly distorted p-value. Default to Welch.
How to Check for Equal vs Unequal Variance (Step-by-Step)
Assessing variance equality involves three layers: informal visual inspection, a formal statistical test, and a decision rule. Work through them in order before choosing your t-test variant.
Step 1 — Visual Inspection
Before running any formal test, plot your data. Side-by-side boxplots are the fastest diagnostic: if the boxes and whiskers are roughly the same length for both groups, equal variance is plausible. If one box is noticeably taller or the whiskers extend much further, heteroscedasticity is likely. Histograms and dotplots reinforce this picture by showing the raw distribution of each group's values.
Visual inspection is not sufficient on its own — it will miss subtle differences and can be misleading with small samples — but it sets expectations before you interpret formal test results.
Step 2 — Run Levene's Test
Levene's test is the standard default for testing variance equality. It works by converting each observation to its distance from the group mean (using absolute deviations), then testing whether those distances differ across groups with a standard F-test. Because it uses absolute values rather than squared values, it is robust to non-normal distributions.
Zᵢⱼ = |Yᵢⱼ − Ȳᵢ| absolute deviation from group mean
N = total observations
k = number of groups
W ~ F(k−1, N−k)
Interpret Levene's test output with the standard threshold: if the p-value exceeds 0.05, you do not reject the assumption of equal variance and the pooled t-test is appropriate. If the p-value falls below 0.05, variances differ significantly and you should use Welch's t-test.
Step 3 — Consider Bartlett's Test (for Normal Data)
Bartlett's test is more powerful than Levene's when data are truly normally distributed, but it is sensitive to departures from normality. A single non-normal group can cause Bartlett's to flag inequality even when variances are actually equal. Use Bartlett's only when you have strong evidence of normality in both groups; otherwise default to Levene's.
Step 4 — Apply the Decision Rule
Choosing the Right Test
Levene's p > 0.05: Do not reject equal variance → use pooled t-test.
Levene's p < 0.05: Reject equal variance → use Welch's t-test.
Unsure or no test run: Default to Welch's t-test — it is safe in both scenarios.
Welch's T-Test vs Pooled T-Test (Deep Comparison)
Both tests compare the means of two independent groups, but they handle the variance structure differently. Understanding the mechanism of each helps you interpret software output and defend your choice to reviewers.
The Pooled T-Test Formula
s_p = pooled standard deviation
df = n₁ + n₂ − 2
The pooled standard deviation s_p is the square root of the pooled variance. Because it uses all degrees of freedom, the t-distribution reference is richer and the test is more powerful — when the equal variance assumption holds.
Welch's T-Test Formula
s₁², s₂² = separate sample variances
df = Welch–Satterthwaite approximation
Welch's t-statistic has the same numerator but a different denominator. The standard error is computed from each group's variance independently, and the degrees of freedom are approximated by the Welch–Satterthwaite equation, which always produces a value smaller than the pooled df. This reduction in df widens the t-distribution, making the critical value more conservative and correctly accounting for additional uncertainty from estimating two separate variances.
The practical difference between the two methods is usually small when sample sizes are large and variances are only moderately unequal. It becomes consequential when sample sizes differ and variance ratios exceed 4:1 — exactly the scenario where choosing incorrectly matters most.
Worked Examples
Example 1 — Equal Variance: Pooled T-Test
Comparing Exam Scores: Two Teaching Methods
A researcher compares exam scores for two groups of students. Group A (traditional lecture): n₁ = 12, x̄₁ = 74, s₁ = 8.2. Group B (flipped classroom): n₂ = 12, x̄₂ = 79, s₂ = 7.9. Levene's test returns p = 0.81. Test whether the means differ at α = 0.05.
Check variance assumption: Levene's p = 0.81 > 0.05 → do not reject equal variance. Proceed with pooled t-test.
Calculate pooled variance: s²_p = [(12−1)(8.2²) + (12−1)(7.9²)] / (12+12−2) = [(11)(67.24) + (11)(62.41)] / 22 = [739.64 + 686.51] / 22 = 1426.15 / 22 = 64.82
Calculate pooled standard deviation and standard error: s_p = √64.82 = 8.05 | SE = 8.05 × √(1/12 + 1/12) = 8.05 × √(0.1667) = 8.05 × 0.4082 = 3.29
Compute t-statistic: t = (74 − 79) / 3.29 = −5 / 3.29 = −1.52. Degrees of freedom: df = 12 + 12 − 2 = 22.
Find critical value and decide: Critical t (two-tailed, df = 22, α = 0.05) ≈ ±2.074. |t| = 1.52 < 2.074. Also p ≈ 0.143 > 0.05.
✓ Conclusion: Fail to reject H₀. There is not enough evidence at α = 0.05 to conclude the two teaching methods differ in mean exam score. The 5-point observed difference is within the range expected from sampling variability alone.
Example 2 — Unequal Variance: Welch's T-Test
Comparing Response Times: Two App Versions
A product team compares user response times (seconds) for App v1 and App v2. App v1: n₁ = 20, x̄₁ = 3.8 s, s₁ = 0.6. App v2: n₂ = 15, x̄₂ = 3.1 s, s₂ = 1.9. Levene's test returns p = 0.003. Test at α = 0.05.
Check variance assumption: Levene's p = 0.003 < 0.05 → reject equal variance. Use Welch's t-test. Note: s₂/s₁ ≈ 3.2 — a large ratio consistent with the test result.
Compute Welch standard error: SE = √(s₁²/n₁ + s₂²/n₂) = √(0.36/20 + 3.61/15) = √(0.018 + 0.2407) = √0.2587 = 0.5086
Compute t-statistic: t = (3.8 − 3.1) / 0.5086 = 0.7 / 0.5086 = 1.376
Welch–Satterthwaite df: df ≈ (0.018 + 0.2407)² / [(0.018²/19) + (0.2407²/14)] ≈ (0.2587)² / [(0.0000171) + (0.004136)] ≈ 0.06693 / 0.004153 ≈ 16.1 → round down to 16.
Decision: Critical t (two-tailed, df = 16, α = 0.05) ≈ ±2.120. |t| = 1.376 < 2.120. p ≈ 0.187 > 0.05.
✓ Conclusion: Fail to reject H₀ using Welch's test. Despite an apparent 0.7-second difference, the unequal variance (and resulting adjusted df = 16 rather than 33) means there is insufficient evidence to conclude the app versions differ significantly in response time at α = 0.05.
Example 3 — A/B Testing Scenario
Case Study
Conversion Rate Variance in E-Commerce A/B Testing
An e-commerce team runs an A/B test on two checkout page designs. Control (n=500): mean conversion 4.2%, s = 0.8%. Variant (n=500): mean conversion 5.1%, s = 2.3%.
The large difference in standard deviations (2.3% vs 0.8%) flags potential heteroscedasticity. Levene's test confirms it: p = 0.001. Welch's t-test is required. Using the pooled test here would understate the standard error and report a p-value that exaggerates confidence in the variant's superiority.
This situation is common in conversion data: the control group's behavior is stable and predictable, while a new variant creates two subpopulations — users who strongly respond and users who ignore the change. The result is higher variance in the variant, not lower performance. Welch's test correctly accounts for this structure.
Testing Methods: Levene's Test and Bartlett's Test
Two formal tests dominate variance equality assessment. Each has a different sensitivity profile, and knowing when to prefer one over the other prevents you from getting misleading results.
Levene's Test
Levene's test converts raw observations to their absolute deviations from the group mean, then applies a standard one-way ANOVA to those deviations. Because the transformation removes the distribution's shape from the test, Levene's remains valid under non-normal data — the common case in practice. It is the default in most statistical packages (SPSS uses Levene's in the Independent Samples T-Test output; R's var.test uses F-test, while leveneTest from the car package gives Levene's directly).
R: car::leveneTest(outcome ~ group, data=df). Python (SciPy): scipy.stats.levene(group1, group2). Both return an F-statistic and p-value. If p < 0.05, switch to Welch's t-test.
Bartlett's Test
Bartlett's test uses a chi-squared statistic based on the likelihood ratio of variances. It is more powerful than Levene's when data are genuinely normal, extracting more information from the distribution shape. However, that sensitivity to shape cuts both ways: even mild skewness or a few outliers can cause Bartlett's to reject equal variance when the variances are in fact equal. Reserve Bartlett's for situations where you have strong independent evidence of normality — for example, after a Q-Q plot confirms the data follow a normal distribution closely.
F-Test of Variance Ratio
The simplest variance test computes F = s₁²/s₂² and compares it to the F-distribution with (n₁ − 1, n₂ − 1) degrees of freedom. A ratio far from 1 (high or low) flags inequality. Like Bartlett's, the F-test is sensitive to non-normality. It is most useful as a quick sanity check or for understanding the magnitude of any variance ratio, not as a standalone decision tool.
| Test | Statistic | Sensitivity to Non-Normality | Best Used When |
|---|---|---|---|
| Levene's Test | F (on absolute deviations) | Low — robust default | Most practical situations; default choice |
| Bartlett's Test | χ² approximation | High — sensitive to outliers | Confirmed normal distributions only |
| F-Test (ratio) | F = s₁²/s₂² | High — assumes normality | Quick check; two-group normal data |
| Fligner–Killeen | χ² on ranks | Very low — non-parametric | Heavily skewed or ordinal data |
Interactive Variance Decision Tool
Variance Assumption Decision Tool
Enter your sample statistics to get an instant recommendation.
Variance Assumptions Beyond the T-Test
Equal variance is not just a t-test concern. It appears as a core assumption in several widely used methods across statistics, and the consequences of violating it vary by context.
One-Way ANOVA
Standard ANOVA assumes that all k groups have the same population variance. When this assumption is violated, the F-statistic is biased. The solution depends on the severity: mild heteroscedasticity may be tolerable with balanced designs, but large imbalances in both sample sizes and variances require Welch's one-way ANOVA (oneway.test in R with var.equal = FALSE) or a variance-stabilising transformation such as a log or square-root transform.
Simple and Multiple Linear Regression
In simple linear regression and multiple linear regression, homoscedasticity means that the variance of the residuals is constant across all fitted values. When residuals fan out (or fan in) as X increases, heteroscedasticity is present. The standard errors of the regression coefficients are biased, making hypothesis tests and confidence intervals unreliable. Remedies include using heteroscedasticity-consistent (HC) robust standard errors (White's correction), transforming the outcome variable, or modelling the variance structure with weighted least squares.
Regression Diagnostic: Residual Plot
The residuals-versus-fitted plot is the primary diagnostic for heteroscedasticity in regression. A random horizontal scatter of residuals across all fitted values is consistent with homoscedasticity. A funnel shape — where residuals spread out as fitted values increase — is the classic heteroscedastic signature. Scale-location plots (square root of standardised residuals against fitted values) and the Breusch–Pagan test provide more formal diagnostics.
Logistic regression models a binary outcome and does not assume homoscedasticity of residuals in the same sense. The variance of a Bernoulli outcome is p(1−p), which varies by design. However, overdispersion — where residual variance exceeds what the model predicts — is an analogous problem addressed through quasi-binomial models or mixed effects models.
Real-World Applications of Variance Assumptions
The equal vs unequal variance question appears across every discipline that compares groups. Here are the most common application contexts.
Clinical Trials
Treatment vs control groups often develop different variances as a drug reduces both mean response and its variability. Welch's t-test is the default for primary endpoints. Protocol pre-specification should state which test will be used.
Experimental Studies
Two experimental conditions may trigger different levels of between-subject variability. Standard practice is to test homogeneity of variance before any independent-samples analysis, reporting Levene's F and p alongside the t-test output.
Achievement Comparisons
Comparing test scores across schools, teaching methods, or demographic groups routinely produces heteroscedastic data. Welch's correction is standard in educational measurement journals.
A/B Testing
Conversion rates, session durations, and revenue per user frequently violate equal variance assumptions. Many tech companies default to Welch's t-test or bootstrap confidence intervals for online experiments.
Quality Control (SPC)
When two production lines are compared, unequal process variances can mask or exaggerate mean differences. Formal variance tests precede any capability study or two-sample comparison in ISO-aligned QC workflows.
Common Mistakes and Misconceptions
| Mistake | Wrong Approach | Correct Approach |
|---|---|---|
| Skipping variance check entirely | Always use pooled t-test without checking | Run Levene's test first; default to Welch if unsure |
| Equating similar sample sizes with equal variance | n₁ ≈ n₂ means variances are probably equal | Sample sizes and variances are independent properties — always test |
| Using Bartlett's test on non-normal data | Apply Bartlett's regardless of distribution | Use Levene's as default; only Bartlett's when normality is confirmed |
| Thinking Welch's always gives a different result | Avoid Welch's because it changes the conclusion | Welch's and pooled often agree when variances are similar; use Welch's anyway as the safe default |
| Assuming equal variance in regression residuals | Trust standard errors without checking residual plots | Plot residuals vs fitted values; use robust standard errors if heteroscedasticity is detected |
| Using the F-test ratio as the only variance check | F = s₁²/s₂², that's the variance test done | F-test is sensitive to non-normality; Levene's is more appropriate for general use |
Entity and Formula Glossary
| Term | Formula / Symbol | Definition | Test Context |
|---|---|---|---|
| Equal variance | σ₁² = σ₂² | Same spread across groups | Supports pooled t-test |
| Unequal variance | σ₁² ≠ σ₂² | Different group spreads | Use Welch's t-test |
| Homoscedasticity | Var(ε) = constant | Constant residual variance across fitted values | ANOVA / regression assumption |
| Heteroscedasticity | Var(ε) not constant | Residual variance changes with X or group | Use robust SE or transform |
| Variance | σ² = Σ(x−μ)²/N | Average squared deviation from the mean | Core measure of spread |
| Standard deviation | σ = √σ² | Square root of variance — same units as data | Reported alongside mean |
| Pooled variance | s²_p = [(n₁−1)s₁²+(n₂−1)s₂²]/(n₁+n₂−2) | Weighted mean of two sample variances | Used in pooled t-test |
| Welch's t-test | t = (x̄₁−x̄₂)/√(s₁²/n₁+s₂²/n₂) | T-test that does not assume equal variance | Unequal or unknown variance |
| Levene's test | W ~ F(k−1, N−k) | Tests equality of variances using absolute deviations | Robust default for any distribution |
| Bartlett's test | χ² approximation | Tests variance equality under normality | More powerful but normality-sensitive |
| F-test (ratio) | F = s₁²/s₂² | Direct ratio of sample variances | Quick check, sensitive to non-normality |
| Welch–Satterthwaite df | See formula above | Approximate degrees of freedom for Welch's test | Always ≤ pooled df |
| Independent t-test | t = (x̄₁−x̄₂)/SE | Two-sample test for independent groups | Pooled or Welch variant |
| Type I error (α) | P(reject H₀ | H₀ true) | False positive rate — inflated by violated variance assumption | Target: 0.05 |
Frequently Asked Questions
What is the difference between equal and unequal variance t-test?
The pooled (equal variance) t-test combines both groups' sample variances into a single pooled estimate and uses n₁ + n₂ − 2 degrees of freedom. Welch's (unequal variance) t-test keeps each group's variance separate and calculates an adjusted, smaller degrees of freedom using the Welch–Satterthwaite approximation. The pooled test is more powerful when variances are truly equal; Welch's test controls Type I error correctly when they are not.
Is Welch's t-test always better?
Not strictly "better" in every scenario, but it is safer as a default. When variances are equal, the pooled test has a slight power advantage because it uses more degrees of freedom. When variances are unequal, Welch's test is unambiguously superior because it controls the actual error rate. Since the power cost of using Welch's when variances are equal is small, most methodologists now recommend it as the universal default.
What is homoscedasticity?
Homoscedasticity means constant variance. In a two-group context, it means σ₁² = σ₂². In regression, it means the variance of the residuals does not change systematically with the fitted values or any predictor. It is a core assumption of ordinary least squares regression, the pooled t-test, and standard ANOVA.
How do you know if variances are unequal?
There are three complementary approaches: (1) visually, using side-by-side boxplots or overlaid histograms — if one group's spread is noticeably wider, variances likely differ; (2) with Levene's test — if p < 0.05, reject equal variance; (3) with a variance ratio F = s₁²/s₂² — ratios above 4 or below 0.25 are a practical flag, though formal tests are more reliable.
Does sample size affect which test to use?
Sample size interacts with variance inequality in two ways. Large, balanced samples (n₁ ≈ n₂) make the pooled t-test relatively robust to modest heteroscedasticity. Small or unequal sample sizes amplify the impact of unequal variances on Type I error. The most dangerous scenario is a small sample in the high-variance group and a large sample in the low-variance group — this combination badly inflates false positive rates with the pooled test.
Can Levene's test itself be wrong?
Yes. Like any hypothesis test, Levene's test has Type I error (falsely concluding variances differ when they do not, in 5% of cases at α = 0.05) and Type II error (missing real variance differences, especially with small samples). With very small samples (<10 per group), Levene's test lacks power and may miss real heteroscedasticity. In that case, a conservative default to Welch's t-test is wise regardless of Levene's p-value.
Summary: Equal vs Unequal Variance
The distinction between equal and unequal variance determines which two-sample t-test formula gives trustworthy results. Homoscedasticity (σ₁² = σ₂²) allows pooling, which is more efficient. Heteroscedasticity (σ₁² ≠ σ₂²) requires Welch's correction, which adjusts both the standard error and degrees of freedom to prevent inflated Type I errors.
The practical workflow is straightforward: inspect data visually with boxplots, confirm with Levene's test, and choose accordingly — or simply default to Welch's t-test throughout. The cost of Welch's when variances happen to be equal is negligible. The cost of the pooled test when variances are unequal can be substantial.
Beyond t-tests, the same logic extends to ANOVA (where Welch's one-way ANOVA replaces standard ANOVA under heteroscedasticity) and linear regression (where heteroscedastic residuals call for robust standard errors or transformation).
Equal variance means same spread across groups (σ₁² = σ₂²) — use the pooled t-test. Unequal variance means different spreads (σ₁² ≠ σ₂²) — use Welch's t-test. When unsure, always use Welch's. Check with Levene's test before deciding. This single habit eliminates one of the most common sources of inflated false positive rates in applied statistics.