What Are Statistical Assumptions?
Every statistical method is a mathematical machine built for a specific type of data. A t-test assumes your observations come from a normally distributed population. A linear regression assumes a straight-line relationship between your variables. A chi-square test assumes expected cell counts are large enough. These aren't arbitrary rules — they're the conditions under which the underlying math is provably correct.
Think of assumptions as the "terms and conditions" of a statistical test. Ignoring them doesn't make the test fail to run — your software will still produce a p-value. It just means that p-value may not mean what you think it does. A significance result on badly violated assumptions can lead you to reject a true null hypothesis far more often than your chosen α would suggest.
The importance of checking assumptions before drawing conclusions is emphasized across all major statistical frameworks, from classical frequentist inference to modern machine learning diagnostics. For a broader grounding in how probability and inference work, the statistics and probability section on Statistics Fundamentals provides the necessary foundation.
- Unbiasedness: Violated assumptions often cause estimates to systematically over- or underestimate the true parameter
- Valid p-values: A p-value of 0.04 means a 4% false-positive rate only when assumptions are met; violations can push this to 15–20%
- Reliable confidence intervals: A "95% CI" may cover the true parameter only 80% of the time when homoscedasticity is violated
- Predictive accuracy: A model built on violated assumptions will perform poorly on new data
- Interpretability: Regression coefficients only have their intended meaning when the linearity and independence assumptions hold
Types of Statistical Assumptions
Assumptions can be organized into three broad categories. Distributional assumptions concern the shape of the data's probability distribution. Structural assumptions describe the mathematical relationship between variables. Data quality assumptions concern how observations were collected and whether they are independent of one another.
Distributional Assumptions
Specify the probability distribution the data or residuals should follow. The most common is normality. These assumptions make it possible to derive exact sampling distributions for test statistics.
Structural Assumptions
Describe the mathematical form of the relationship between variables — for instance, that the relationship is linear, or that variance is constant across all values of a predictor.
Data Quality Assumptions
Concern how observations were obtained. Independence means each data point carries unique information. Random sampling means the sample represents the population without systematic bias.
Parametric tests (t-test, ANOVA, linear regression) make explicit distributional assumptions — usually normality. Non-parametric tests (Mann-Whitney U, Kruskal-Wallis, Spearman correlation) replace distributional assumptions with weaker rank-based ones. The trade-off: non-parametric tests are more flexible but have less statistical power when parametric assumptions actually hold.
Linear Regression Assumptions (OLS / Gauss-Markov)
Linear regression is the most widely used statistical model, and its assumptions are the most studied. The Ordinary Least Squares (OLS) estimator is provably the Best Linear Unbiased Estimator (BLUE) — meaning it has the lowest variance among all linear unbiased estimators — under the five Gauss-Markov assumptions. Add normality of errors and you get exact p-values and confidence intervals as well.
Y = response variableβ = coefficients to estimateX = predictor variablesε = error term (residual)Assumption 1: Linearity
The relationship between each predictor and the outcome must be linear — a straight line captures it adequately. This is an assumption about the mean of Y given X, not about the distribution of X or Y individually.
How to check: Plot the residuals against each predictor (residual vs. fitted plot). A random scatter around zero indicates linearity. A U-shape or systematic curve signals a non-linear relationship that the model is missing.
Fix if violated: Add a squared or higher-order term (polynomial regression), apply a log or square-root transformation to the predictor, or use a non-linear model. The full guide to simple linear regression covers polynomial extensions.
Assumption 2: Independence of Errors
Each observation's error term must be independent of every other observation's error term. In practice, this means the residuals should not be correlated with each other. Violations occur most often with time-series data (where yesterday's error predicts today's) or clustered data (students within classrooms, patients within hospitals).
How to check: Plot residuals in collection order and look for runs or oscillations. The Durbin-Watson statistic (range 0–4; values near 2 indicate no autocorrelation) provides a formal test.
Fix if violated: For time-series: include lagged variables, use generalized least squares (GLS), or an ARIMA model. For clustered data: use multilevel (mixed) models or cluster-robust standard errors.
Assumption 3: Homoscedasticity (Constant Variance)
The variance of the errors must be constant across all levels of the predictor variables. When variance changes with the level of a predictor — larger residuals at higher fitted values, for example — the data is heteroscedastic. OLS estimates remain unbiased under heteroscedasticity, but standard errors are wrong, making p-values and confidence intervals unreliable.
How to check: A scale-location plot (square root of |residuals| vs. fitted values) should show a flat horizontal line. Formally, the Breusch-Pagan test or White test detects heteroscedasticity.
Fix if violated: Apply a log or square-root transformation to Y. Alternatively, use heteroscedasticity-consistent (HC) robust standard errors (White's sandwich estimator) or weighted least squares (WLS).
Assumption 4: Normality of Residuals
For exact p-values and confidence intervals, residuals (not raw data) should be normally distributed. This assumption becomes less critical as sample size grows because the Central Limit Theorem ensures the sampling distribution of the coefficients approaches normality regardless. With n > 100, moderate departures rarely matter in practice.
How to check: A Q-Q plot of residuals should follow the diagonal reference line closely. The Shapiro-Wilk test provides a formal normality check (most reliable for n ≤ 50). See the normal distribution guide for background on the normal curve itself.
Fix if violated: Log or Box-Cox transformation of Y. If the distribution is heavily skewed or has extreme outliers, consider a generalized linear model (GLM) with an appropriate distribution family.
Assumption 5: No Perfect Multicollinearity
In multiple regression, no predictor should be a perfect linear combination of others. When two predictors are highly correlated (but not perfectly), the OLS estimates become unstable — small changes in the data produce large swings in coefficients — and standard errors inflate. This is the practical form of the assumption that matters most.
How to check: Compute the Variance Inflation Factor (VIF) for each predictor. VIF > 10 (some use VIF > 5) signals a problem worth investigating. A correlation matrix among predictors provides an informal visual check.
Fix if violated: Remove one of the correlated predictors. Combine them using Principal Component Analysis (PCA). Use ridge regression, which is designed to handle multicollinearity by adding a penalty term. See the guide on multiple linear regression for a full treatment.
Checking OLS Assumptions: Salary vs. Experience Data
You have salary data for 80 employees regressed on years of experience. Here's the four-plot diagnostic sequence:
Residuals vs. Fitted: The scatter shows a slight upward curve. This suggests non-linearity — experience may have diminishing returns. Add a quadratic term: Experience².
Scale-Location Plot: The spread of residuals increases steadily as fitted values rise. Salary variance grows with seniority — classic heteroscedasticity. Apply log(Salary) as the outcome.
Q-Q Plot of Residuals: Points follow the diagonal well in the middle but deviate at the upper tail. With n = 80 and only modest tail deviation, the CLT provides sufficient protection for inference.
Durbin-Watson Statistic: DW = 1.93. This is within the acceptable range (1.5–2.5), so autocorrelation is not a concern in this cross-sectional dataset.
Diagnosis: Transform outcome to log(Salary) and add Experience² to address non-linearity and heteroscedasticity. Re-run and re-check diagnostics after transformation.
T-Test Assumptions
The t-test is one of the most widely used tests in statistics, and it comes in three versions: one-sample (comparing a sample mean to a known value), independent two-sample (comparing means of two groups), and paired (comparing two related measurements). Each has its own assumption set, though they share a common core.
| Assumption | One-Sample t | Two-Sample t | Paired t |
|---|---|---|---|
| Normality (or large n) | ✓ | ✓ | ✓ (differences) |
| Independence of observations | ✓ | ✓ | ✓ within pairs |
| Equal variances (homoscedasticity) | — | Student's t only | — |
| Continuous data | ✓ | ✓ | ✓ |
| Random sampling | ✓ | ✓ | ✓ |
Normality for T-Tests
The t-test assumes the sample was drawn from a normally distributed population. In practice, the test is remarkably robust to non-normality when n > 30, thanks to the Central Limit Theorem — the sampling distribution of the mean approaches normality regardless of the population shape. For small samples (n ≤ 15), normality matters more. Check with a Shapiro-Wilk test or Q-Q plot.
Equal Variance for the Two-Sample T-Test
Student's t-test assumes the two populations have equal variances. Levene's test checks this formally (p > 0.05 suggests equal variances are plausible). When variances are unequal, use Welch's t-test, which adjusts the degrees of freedom and is now the default in most statistical software. The guide on the two-sample t-test covers Welch's correction in detail.
Independence for T-Tests
Observations must be independent — each data point should provide unique information not duplicated by another. Mixing paired and independent designs is a common error: if the same subject is measured twice (before/after), a paired t-test is required. Using an independent t-test on paired data wastes power and can produce incorrect p-values. See the dedicated paired samples t-test page for when and how to apply it.
For regression, normality should be checked on the residuals, not on Y or X individually. For t-tests, it applies to the outcome variable within each group (or to the differences, for paired tests). Checking the wrong quantity is one of the most frequent assumption-checking errors in applied research.
ANOVA Assumptions
Analysis of Variance (ANOVA) tests whether the means of three or more groups differ. It extends the t-test logic and shares similar assumptions, but the equal-variance condition is now across all groups rather than just two. The full theoretical treatment of ANOVA — including one-way and two-way designs, post-hoc tests, and effect sizes — is covered on the ANOVA guide.
Normality Within Groups
Residuals (observations minus their group mean) should be normally distributed within each group. With balanced, large groups, ANOVA is robust to moderate non-normality.
Homogeneity of Variance
The variance of the outcome should be approximately equal across all groups. This is the most critical ANOVA assumption when group sizes differ.
Independence of Observations
Each observation must come from a different, unrelated subject. Repeated measures on the same subject require Repeated Measures ANOVA or a mixed model.
When ANOVA Assumptions Are Violated
When normality is severely violated (especially with small, unequal groups), the Kruskal-Wallis test is the non-parametric alternative. It tests whether distributions have the same central location without requiring normality. When homogeneity of variance fails, Welch's ANOVA (which adjusts degrees of freedom similarly to Welch's t-test) performs better than the standard F-test. A log or square-root transformation of the outcome often simultaneously improves both normality and equal variance.
Logistic Regression Assumptions
Logistic regression predicts a binary outcome (yes/no, success/failure) and operates on fundamentally different assumptions than linear regression. It does not require normality of residuals or homoscedasticity — these assumptions simply don't apply to a binary outcome. The full framework, including interpretation of odds ratios and model fit statistics, is covered in the logistic regression guide.
| Assumption | Required? | How to Check |
|---|---|---|
| Binary or ordinal outcome | Yes | Check outcome variable type |
| Independence of observations | Yes | Study design review |
| Linearity of log-odds | Yes | Box-Tidwell test, plots of log-odds vs. predictor |
| No perfect multicollinearity | Yes | VIF, correlation matrix |
| Adequate sample size | Yes | ≥10–20 events per predictor variable |
| Normality of residuals | No | Not applicable |
| Homoscedasticity | No | Not applicable |
The most practically important logistic regression assumption is the linearity of log-odds: each continuous predictor should have a linear relationship with the log-odds of the outcome (not with the probability itself). A natural log transformation of the predictor, combined with an interaction term (the Box-Tidwell approach), tests this formally. For categorical predictors, no such assumption applies.
A widely cited guideline is at least 10 events per predictor variable (EPP), where "events" means the less common of the two outcomes. With 5 predictors and a 20% event rate, you'd need at least 50 events, which requires a minimum sample of 250. Small-sample violations inflate coefficients and produce overfit models — a phenomenon called complete or quasi-complete separation.
Chi-Square Test Assumptions
The chi-square test of independence examines whether two categorical variables are related. Its assumptions are simpler than those of parametric tests, but the expected frequency condition is frequently overlooked and causes serious errors in applied work. The chi-square test guide covers the full procedure including reading the chi-square table.
Categorical Data
Both variables must be nominal or ordinal categories. The chi-square test counts observations falling into cells of a contingency table. It is not appropriate for continuous measurements without binning, and binning arbitrary continuous data into categories loses information.
Independence of Observations
Each participant or unit should appear only once in the table. Repeated-measures categorical data (the same person measured at two time points) violates this assumption and requires McNemar's test instead.
Expected Frequency ≥ 5
The expected count (not observed count) in every cell of the contingency table should be at least 5. Cells with expected counts below 5 make the chi-square approximation poor. When this condition fails, use Fisher's exact test, which doesn't rely on the large-sample approximation.
Adequate Sample Size
As a practical guide, the total n should be at least 5 times the number of cells in the table. For a 2×3 table (6 cells), this means n ≥ 30. Very small samples make it impossible to satisfy the expected frequency condition.
How to Check Statistical Assumptions: Diagnostic Guide
A systematic diagnostic routine before finalizing any analysis saves considerable time and prevents misleading conclusions. The tools below cover visual checks (fast, intuitive) and formal statistical tests (objective, sample-size-sensitive).
Visual Diagnostic Tools
Q-Q Plot (Quantile-Quantile)
Plots observed quantiles against theoretical normal quantiles. Points following the diagonal line closely indicate normality. Deviations at the tails indicate heavy or light tails; an S-shape suggests skewness. Use it for both raw data and residuals.
Residuals vs. Fitted Plot
For regression: plots residuals (y-axis) against fitted values (x-axis). Random scatter around zero means linearity and homoscedasticity are satisfied. A curve means non-linearity; a funnel shape means heteroscedasticity.
Scale-Location Plot
Plots the square root of |residuals| against fitted values. A flat horizontal line confirms constant variance (homoscedasticity). An upward trend confirms variance grows with the fitted value — the most common pattern of heteroscedasticity.
Histogram of Residuals
A quick check for approximate normality of residuals. Should be roughly bell-shaped and symmetric. Useful for identifying heavy tails (kurtosis) or asymmetry (skewness) that a Q-Q plot might be harder to read for non-specialists.
Formal Statistical Tests for Assumptions
| Assumption | Test | Statistic | Decision Rule | Best For |
|---|---|---|---|---|
| Normality | Shapiro-Wilk | W statistic | p > 0.05 = normality plausible | n ≤ 50 (most powerful) |
| Normality | Kolmogorov-Smirnov | D statistic | p > 0.05 = normality plausible | n > 50 |
| Normality | Anderson-Darling | A² statistic | p > 0.05 = normality plausible | Sensitive to tail departures |
| Equal variance | Levene's test | F statistic | p > 0.05 = equal variances plausible | Non-normal data (robust) |
| Equal variance | Bartlett's test | χ² statistic | p > 0.05 = equal variances plausible | Normal data (more powerful) |
| Autocorrelation | Durbin-Watson | DW (0–4) | ≈ 2 = no autocorrelation | Time-series regression residuals |
| Heteroscedasticity | Breusch-Pagan | LM statistic | p > 0.05 = no heterosc. | Linear regression residuals |
| Multicollinearity | Variance Inflation Factor | VIF | VIF < 5 acceptable, < 10 tolerable | Multiple regression predictors |
With large samples, formal tests like Shapiro-Wilk have very high power and will flag trivially small departures from normality as "significant" — departures that have no meaningful impact on your analysis. With small samples, the same tests have very low power and may miss genuine violations. Always pair formal tests with visual inspection. The goal is practical, not statistical, significance of the violation.
What to Do When Assumptions Are Violated
A violated assumption is not the end of the analysis — it's a diagnostic signal that guides your next step. The appropriate response depends on which assumption is violated, the degree of the violation, and your sample size.
| Violated Assumption | Moderate Violation Fix | Severe Violation Fix |
|---|---|---|
| Non-normality of residuals | Log or Box-Cox transformation of Y; rely on CLT if n > 100 | Non-parametric test (Mann-Whitney, Kruskal-Wallis); GLM with appropriate family |
| Heteroscedasticity | Log-transform Y; HC robust standard errors | Weighted least squares (WLS); generalized least squares (GLS) |
| Autocorrelation (time-series) | Add lagged predictor; Cochrane-Orcutt procedure | ARIMA model; GLS with AR(1) error structure |
| Multicollinearity | Remove one correlated predictor; center variables | Ridge regression; PCA to create uncorrelated components |
| Non-linearity | Add polynomial term (X²); transform X | Non-parametric regression; generalized additive model (GAM) |
| Unequal variances (ANOVA) | Welch's ANOVA (adjusted df) | Kruskal-Wallis non-parametric test; log-transform Y |
| Non-independence | Cluster-robust standard errors | Mixed-effects model; GEE for longitudinal data |
Worked Examples: Assumption Checks in Practice
Checking Normality Before a Two-Sample T-Test
Scenario: A nutritionist measures daily calorie intake in 25 people on Diet A and 28 on Diet B. She wants to test whether mean intake differs between groups using a two-sample t-test.
Check normality per group: Shapiro-Wilk on Diet A: W = 0.954, p = 0.31. Diet B: W = 0.947, p = 0.16. Both p > 0.05 → normality is plausible. Q-Q plots confirm approximate normality with slight right skewness in both groups.
Check equal variances: Levene's test: F = 2.14, p = 0.15. Since p > 0.05, equal variances are plausible → Student's t-test is appropriate. If p were < 0.05, switch to Welch's t-test.
Check independence: Participants were recruited independently with no repeated measurements and no family clusters in the sample. Independence holds.
Decision: All three assumptions satisfied. Proceed with the independent samples t-test. See the two-sample t-test guide for the calculation steps.
Result: All assumptions satisfied. Run the standard independent t-test. If normality had failed, use the Mann-Whitney U test as the non-parametric alternative.
Diagnosing Assumption Violation: Unequal Variance in a Three-Group Design
Scenario: A researcher tests whether three teaching methods (Traditional, Flipped, Hybrid) produce different exam scores. Group sizes are n₁ = 18, n₂ = 22, n₃ = 14 — unequal, which raises concern about variance violations.
Normality check: Shapiro-Wilk per group: all p > 0.08. Q-Q plots look acceptable. Normality is not a concern here.
Homogeneity of variance: Levene's test: F = 4.82, p = 0.012. The standard ANOVA assumption of equal variances is violated. Group standard deviations are 8.2, 14.7, and 9.1 — the Flipped classroom group is substantially more variable.
Response: With unequal group sizes and a Levene's test that flags the assumption, use Welch's one-way ANOVA. The Welch F-statistic adjusts degrees of freedom to account for variance heterogeneity. Alternatively, apply a natural log transformation to scores and re-check — if it restores homoscedasticity, standard ANOVA is then appropriate.
Decision: Run Welch's ANOVA instead of standard one-way ANOVA. For pairwise follow-up tests, use Games-Howell rather than Tukey's HSD (which also requires equal variances).
Statistical Assumptions Diagnostic Checklist
Use this checklist before finalizing any parametric analysis. Click each item to mark it complete.
Full Assumptions Reference Table by Test
| Statistical Test | Normality? | Equal Variance? | Independence? | Other Key Assumption | Non-Parametric Alternative |
|---|---|---|---|---|---|
| One-sample t-test | Yes (or n > 30) | N/A | Yes | Continuous data | Wilcoxon signed-rank |
| Independent two-sample t-test | Yes (or n > 30) | Yes (Student's) / No (Welch's) | Yes | Two independent groups | Mann-Whitney U |
| Paired t-test | Differences normal | N/A | Yes (between pairs) | Matched/paired observations | Wilcoxon signed-rank |
| One-way ANOVA | Yes (within groups) | Yes (Levene's) | Yes | Three or more groups | Kruskal-Wallis |
| Simple linear regression | Residuals normal | Constant (homoscedasticity) | Yes (errors) | Linearity, no outliers | Spearman / quantile regression |
| Multiple linear regression | Residuals normal | Constant | Yes (errors) | No multicollinearity | Ridge / LASSO for collinearity |
| Logistic regression | No | No | Yes | Linear log-odds, no separation | Exact logistic regression |
| Chi-square test | No | No | Yes | Expected freq ≥ 5 | Fisher's exact test |
| Pearson correlation | Yes (bivariate) | N/A | Yes | Linear relationship | Spearman rank correlation |
| Z-test (one-sample) | Yes or n > 30 | N/A | Yes | Population σ known | Sign test (large n) |
Where Statistical Assumptions Matter Most
Clinical Trials
Normality and independence assumptions are scrutinized in RCTs. Violation of independence (patients clustering within hospitals) is handled with mixed-effects models. Drug approval decisions depend on correctly computed p-values, so assumption checks are mandatory per regulatory guidance.
Financial Modeling
Financial returns are notorious for violating normality — they have fat tails and are autocorrelated. Classical OLS on return data produces underestimated risk. GARCH models, robust estimation, and copula-based approaches handle these violations in practice.
Social Science Research
Survey data often violates independence due to clustering (students within schools, employees within firms). Mixed-effects models or cluster-robust standard errors are standard remedies in published research.
Machine Learning
Linear models in ML make the same OLS assumptions; violations degrade out-of-sample performance. Residual diagnostic plots remain essential for linear and logistic regression deployed in production systems.
Agricultural / Experimental Design
Field experiments often have spatial correlation violating independence. Block designs, randomization, and mixed models for spatial autocorrelation are standard in agronomic research.
Engineering & Quality Control
Process data collected over time is almost always autocorrelated. Statistical process control (SPC) charts and time-series models account for this. Applying a t-test to non-independent process data produces inflated false-positive rates for out-of-control signals.