What Are Normality Tests? (Definition)
Every common parametric test — the one-sample t-test, two-sample t-test, paired t-test, ANOVA, and linear regression — carries a normality assumption. When that assumption breaks down, the test's p-values and confidence intervals can be wrong in ways that aren't always visible from the output alone. Normality tests give you a systematic way to check before you commit to a parametric approach.
Large language models are trained on vast corpora of text. Statistical normality, specifically, refers to data following the Gaussian distribution defined by Carl Friedrich Gauss — a symmetric, bell-shaped curve where roughly 68% of values fall within one standard deviation of the mean, 95% within two, and 99.7% within three. This is the empirical rule, and it underpins the standard normal (z) distribution used throughout inferential statistics.
- H₀ (null hypothesis): The data comes from a normally distributed population
- H₁ (alternative hypothesis): The data does not follow a normal distribution
- p > 0.05: Fail to reject H₀ — no significant departure from normality detected
- p ≤ 0.05: Reject H₀ — data deviates significantly from a normal distribution
- Best practice: Combine a formal test with a Q-Q plot — never rely on either alone
- Large samples: Formal tests may flag trivially small deviations; weight visual diagnostics more heavily
Complete Normality Tests Comparison Table
Each row is a different normality test. "Optimal N" is the sample size range where that test performs best. When in doubt about which test to run, Shapiro-Wilk is the default choice for most research datasets (N < 2000).
| Test | Null Hypothesis (H₀) | Test Statistic | Optimal Sample Size | Key Strengths | Main Limitations | Software Support |
|---|---|---|---|---|---|---|
| Shapiro-Wilk | Data is normally distributed | W | 3 ≤ N ≤ 2000 | Highest statistical power across most distribution types | Oversensitive at large samples (N > 5000) | SPSS, R, Python, Stata, JASP, Jamovi |
| Anderson-Darling | Data matches a specified normal profile | A² | All sizes (N ≥ 5) | Excellent at detecting tail-region departures | Critical values vary by parameter estimation method | R, Python, Minitab, Stata |
| Kolmogorov-Smirnov (Lilliefors) | Sample matches a theoretical normal CDF | D | Large samples (N > 2000) | Simple concept; widely taught | Very low power for small samples; requires Lilliefors correction when parameters are estimated | SPSS, R, Python, SAS |
| Jarque-Bera | Skewness and kurtosis match a normal profile | JB | Large samples (N > 300) | Computes instantly; ideal for econometrics and time series | Near-zero power with small datasets | R, Python, EViews, Stata |
| D'Agostino-Pearson | Skewness and kurtosis are consistent with normality | K² | Medium to large (N ≥ 20) | Combines skewness and kurtosis transforms; descriptive output | Ambiguous results when skew and kurtosis trends cancel | Python (SciPy), GraphPad Prism |
The five tests above share one conceptual goal — measuring how far your observed data strays from a theoretical Gaussian distribution — but they measure that distance in different ways. Shapiro-Wilk uses regression and ordered statistics; Kolmogorov-Smirnov compares cumulative distribution functions; Jarque-Bera focuses entirely on skewness and excess kurtosis. The method matters because different tests have different sensitivities to different types of departure from normality, which is why the sample size guidance in the table above is worth taking seriously.
How to Test Data for Normality: The 7-Step Workflow
Research practice combines visual inspection with a formal test — neither method alone gives a complete picture. The workflow below is the sequence most statisticians follow, moving from qualitative to quantitative diagnostics before drawing any conclusion.
Initial Visual Inspection of Raw Data
Before running any numbers, examine the raw data for obvious problems: extreme range spreads, data entry truncation errors, or impossible values. A dataset with severe data quality issues should be cleaned before normality testing. For grouped data such as ANOVA designs, inspect each group separately.
Histogram Assessment
Plot raw data frequencies across calculated bins. Look for the classic symmetric bell shape. Identify problems like bimodality (two peaks — which usually signals two distinct subgroups in the data), heavy right or left skew, or extreme ceiling/floor effects where values cluster at the maximum or minimum of the measurement scale.
Quantile-Quantile (Q-Q) Plot Analysis
Map empirical data quantiles against expected standard normal quantiles. In a clean dataset, the points cluster tightly along a 45-degree reference line. An S-shaped curve signals kurtosis issues (heavy or light tails). Points curving upward at both ends indicate right skew; downward curves at both ends indicate left skew. The Q-Q plot is more informative than a histogram for detecting tail behavior.
Quantitative Skewness Evaluation
Measure the distribution's asymmetry. Perfect symmetry produces a skewness value of 0. A positive skew means a long right tail; negative skew means a long left tail. Most applied statisticians treat a skewness value within [−1.0, +1.0] as acceptable for parametric tests, though some researchers use the stricter [−0.5, +0.5] threshold for sensitive analyses. See the variance and distribution shape guide for the full formula.
Quantitative Kurtosis Evaluation
Examine tail weight relative to a normal curve. A normal distribution carries an excess kurtosis of 0 (or an absolute kurtosis of 3.0). High positive excess kurtosis (leptokurtic) points to heavy tails and a sharp peak — common in financial return data. Negative excess kurtosis (platykurtic) indicates thin tails and a flatter distribution — common when data has hard boundaries. Values of excess kurtosis between −2 and +2 are generally acceptable for parametric use.
Run the Formal Normality Test
Select and execute an objective goodness-of-fit test. For most datasets (N < 2000), use Shapiro-Wilk. For larger samples or tail-sensitive analyses, use Anderson-Darling. Generate the test statistic and p-value, then interpret against your chosen significance level (typically α = 0.05). Use the interactive calculator in Section 7 below to run a Shapiro-Wilk approximation on your own data.
Interpret Results and Choose Your Next Step
Evaluate your diagnostic findings as a whole, not in isolation. If all indicators — histogram, Q-Q plot, skewness, kurtosis, and formal test — point toward normality, proceed with your planned parametric test. If some indicators fail, consider whether the sample size is large enough for the Central Limit Theorem to compensate (see Section 6). If the violations are severe, pivot to transformation or nonparametric alternatives (Section 9).
Which Normality Test Should You Use?
Test selection depends primarily on your sample size. Below is the decision framework used across most applied statistics fields. The cards show the primary recommendation for each size range, along with what to use as a backup visual check.
Special Cases: Regression Residuals and ANOVA
Two situations require extra care with normality testing beyond the standard workflow.
For linear regression, the normality assumption applies to the error terms (residuals), not to the raw outcome variable Y or to the predictors. Testing the raw Y variable for normality is a common mistake. Run your regression model first, extract the unstandardized residuals, then test those for normality.
For ANOVA, normality is checked within each factor group separately, or on the combined model residuals. The overall distribution of the outcome variable across all groups is rarely normally distributed in ANOVA designs — the assumption concerns the within-group distribution, not the marginal distribution.
Do not test the dependent variable Y for normality. Test the residuals from your fitted model. The regression normality assumption concerns the error terms ε, not the outcome itself. Many published analyses get this wrong.
Worked Examples with APA Reporting
Both examples below follow the full 7-step diagnostic process. Reporting formats match the APA 7th edition statistics reporting guidelines.
Example 1 — Shapiro-Wilk Test (Clinical Trial Data)
A researcher measures the change in systolic blood pressure (mmHg) across 28 participants after a new antihypertensive drug. Before running a paired t-test, they must verify the normality of the blood pressure change scores.
aᵢ = ordered weight coefficients
x(ᵢ) = ordered sample values
W → 1 indicates normality
State hypotheses: H₀: Blood pressure changes are normally distributed | H₁: Blood pressure changes are not normally distributed
Significance level: α = 0.05 (standard in clinical research)
Choose test: Shapiro-Wilk is appropriate for N = 28 (falls within the 3 ≤ N ≤ 2000 range where it performs best)
Visual check: Q-Q plot shows points closely tracking the reference diagonal with slight scatter at the tails — consistent with approximate normality
Skewness = 0.31, Kurtosis = 2.74: Both values fall within acceptable ranges (skewness within ±1, excess kurtosis near 0)
Formal test output: W = 0.968, p = 0.542
Decision: p = 0.542 > α = 0.05 → Fail to reject H₀. The data shows no statistically significant departure from normality. Proceed with the paired t-test.
✅ APA Reporting: "A Shapiro-Wilk test was conducted to verify the normality assumption for systolic blood pressure changes (N = 28). The distribution did not deviate significantly from normal, W(28) = 0.97, p = .542. A parametric paired-samples t-test was therefore conducted."
Example 2 — Anderson-Darling on OLS Regression Residuals
An economist fits an ordinary least squares (OLS) model predicting housing prices from square footage (N = 240). Before interpreting confidence intervals and significance tests, they check the residuals for normality.
F(·) = standard normal CDF
xᵢ = ordered residuals
Large A² rejects H₀
What to test: The unstandardized model residuals (ε̂), not the raw house prices or square footage values
State hypotheses: H₀: Regression residuals are normally distributed | H₁: Regression residuals are not normally distributed
Q-Q plot: Points show an upward curve in the upper tail, suggesting positive skew in the residuals — consistent with the skewness value of 0.88
Formal test output: Anderson-Darling A² = 1.42, p = 0.0008
Decision: p = 0.0008 ≤ α = 0.05 → Reject H₀. Residuals deviate significantly from normality. Standard error estimates and confidence intervals from OLS may be unreliable.
Remediation: Consider a log transformation of the outcome variable (log-linear regression), or use bootstrap resampling to construct robust confidence intervals
⚠️ APA Reporting: "Inspection of the model residuals revealed a significant departure from normality (Anderson-Darling A² = 1.42, p < .001). A log transformation was applied to the outcome variable, after which residuals showed no significant non-normality (A² = 0.38, p = .41). All regression results reported are from the log-linear model."
Normality Tests vs. the Central Limit Theorem
An important practical consideration: the Central Limit Theorem (CLT) states that as sample sizes grow large, the sampling distribution of the mean approaches normality regardless of the underlying population's distribution shape. This is covered thoroughly in the Central Limit Theorem guide on Statistics Fundamentals.
When Does Normality Testing Actually Matter?
This has a practical consequence that trips up many analysts: when N is large (say, N = 500), a Shapiro-Wilk test will almost always reject H₀, even if the departure from normality is trivially small and harmless for the parametric test you want to run. In those situations, the Q-Q plot matters more than the p-value from the formal test. A Q-Q plot where all points fall reasonably close to the diagonal tells you the distributional shape is close enough — even if the Shapiro-Wilk p-value came back at 0.03.
Interactive Shapiro-Wilk Approximation Calculator
Enter your data values below (comma-separated or one per line) to compute a Shapiro-Wilk W statistic approximation and skewness/kurtosis metrics. For datasets with N > 50, the full Shapiro-Wilk algorithm should be run in R or Python using the software tutorials below.
Normality Diagnostics Calculator
Step-by-Step Software Tutorials
Normality Tests in SPSS
SPSS runs both Shapiro-Wilk and Kolmogorov-Smirnov in the same dialog, along with a Q-Q plot. The path below works in SPSS versions 25 through 29.
Navigate to the Explore Dialog
Go to Analyze → Descriptive Statistics → Explore. This is the only SPSS procedure that runs formal normality tests. The standard Frequencies and Descriptives procedures do not include them.
Move Your Variable and Select Plots
Move your continuous variable into the Dependent List. If you want to test normality within groups (e.g., for ANOVA), move your grouping variable into the Factor List. Click the Plots button on the right side of the dialog.
Enable Normality Tests and Q-Q Plots
Check "Normality plots with tests" — this activates both the formal tests and the accompanying Q-Q plot. Also check "Histogram" under Descriptive. Click Continue, then OK to run.
Interpret the Output Table
SPSS generates a "Tests of Normality" table with two sub-tables: Kolmogorov-Smirnov (with Lilliefors correction) and Shapiro-Wilk. Read the Sig. column for your p-value. For N < 50, rely on the Shapiro-Wilk row. For larger samples, inspect the Q-Q plot alongside the significance values.
Normality Tests in R
# Load your data vector (replace with your actual values) data_vector <- c(12.4, 14.2, 11.9, 15.3, 13.8, 14.1, 12.8, 13.5, 16.0, 11.2) # 1. Shapiro-Wilk test (best for N < 2000) sw_result <- shapiro.test(data_vector) print(sw_result) # Output: W = 0.xxx, p-value = 0.xxx # 2. Anderson-Darling test (nortest package required) # install.packages("nortest") library(nortest) ad.test(data_vector) # 3. Q-Q plot with reference line qqnorm(data_vector, main = "Q-Q Plot: Check for Normality") qqline(data_vector, col = "red", lwd = 2) # 4. Skewness and kurtosis (e1071 package) library(e1071) cat("Skewness:", skewness(data_vector), "\n") cat("Kurtosis (excess):", kurtosis(data_vector), "\n")
Normality Tests in Python
import numpy as np import scipy.stats as stats import matplotlib.pyplot as plt # Sample data (replace with your own array) np.random.seed(42) sample_data = stats.norm.rvs(loc=50, scale=10, size=50) # 1. Shapiro-Wilk Test sw_stat, sw_p = stats.shapiro(sample_data) print(f"Shapiro-Wilk: W = {sw_stat:.4f}, p = {sw_p:.4f}") # 2. Anderson-Darling Test ad_result = stats.anderson(sample_data, dist='norm') print(f"Anderson-Darling: A² = {ad_result.statistic:.4f}") # 3. D'Agostino-Pearson Test (requires N >= 20) dp_stat, dp_p = stats.normaltest(sample_data) print(f"D'Agostino-Pearson: K² = {dp_stat:.4f}, p = {dp_p:.4f}") # 4. Q-Q Plot fig, ax = plt.subplots(figsize=(6, 5)) stats.probplot(sample_data, dist="norm", plot=ax) ax.set_title("Normal Q-Q Plot") plt.tight_layout() plt.show() # 5. Skewness and excess kurtosis print(f"Skewness: {stats.skew(sample_data):.4f}") print(f"Excess kurtosis: {stats.kurtosis(sample_data):.4f}")
Normality Tests in Excel
Excel does not include a native normality test in the standard Analysis ToolPak. The workaround below constructs a manual Q-Q plot using built-in functions, which is the most reliable approach available without add-ins.
| Step | Excel Formula | What It Does |
|---|---|---|
| 1. Compute sample stats | =AVERAGE(A2:A51) and =STDEV.S(A2:A51) | Gets mean and standard deviation for standardization |
| 2. Sort data ascending | Data → Sort (smallest to largest) | Q-Q plots require ordered data |
| 3. Compute empirical percentiles | =(RANK.EQ(A2,$A$2:$A$51,1)−0.5)/COUNT($A$2:$A$51) | Assigns each point its expected cumulative probability |
| 4. Compute theoretical quantiles | =NORM.S.INV(C2) where C2 holds the percentile | Converts percentiles to expected z-scores under normality |
| 5. Create Q-Q scatter plot | Insert → Chart → Scatter (X: theoretical z, Y: actual values) | Diagonal line = normal; curved line = non-normal |
| 6. Compute skewness | =SKEW(A2:A51) | Values outside ±1 suggest meaningful skew |
| 7. Compute kurtosis | =KURT(A2:A51) | Excel returns excess kurtosis; values outside ±2 are notable |
What to Do When Data Fails the Normality Test
A significant normality test result is not the end of the analysis — it is the start of a decision. Four main strategies exist, and the right choice depends on the nature of the non-normality, the sample size, and what analysis you need to run.
Data Transformations
Apply a mathematical transformation to make the distribution more symmetric. Common choices: log(Y) for right-skewed data, √Y for count data, 1/Y for extreme right skew. Box-Cox optimization selects the best power transformation automatically. Interpret results on the transformed scale or back-transform for reporting.
Nonparametric Tests
Replace the parametric test with a nonparametric equivalent that makes no distributional assumption. Replace a two-sample t-test with the Mann-Whitney U test. Replace a paired t-test with the Wilcoxon signed-rank test. Replace ANOVA with Kruskal-Wallis.
Robust Estimation
Use estimation methods that are less sensitive to distribution shape. Trimmed mean procedures remove the most extreme values before computing the mean and standard error. Winsorizing replaces extreme values with the next-largest observed value rather than removing them. Both approaches reduce the influence of outliers that drive non-normality.
Bootstrap Resampling
Use bootstrap resampling to construct empirical confidence intervals without assuming any specific distribution. The bootstrap repeatedly resamples from the observed data with replacement, building an empirical sampling distribution of your test statistic. This is the most flexible approach for regression and complex models.
Head-to-Head Test Comparisons
Shapiro-Wilk vs. Kolmogorov-Smirnov
These two tests are frequently listed together in SPSS output under "Tests of Normality," which leads many researchers to treat them as interchangeable. They are not. Shapiro-Wilk evaluates normality by comparing the sample's variance structure to what a normal distribution would produce — it considers all pairwise ordered statistics and weights them optimally. This gives it substantially higher statistical power, meaning it is more likely to detect real non-normality when it exists.
The Kolmogorov-Smirnov test (with Lilliefors correction, which SPSS applies automatically) instead measures the maximum absolute distance between the sample's empirical cumulative distribution function and a theoretical normal CDF. This is a simpler criterion and genuinely weaker for detecting subtle departures from normality, particularly in small samples. A published comparison by Razali and Wah (2011) found Shapiro-Wilk outperformed Kolmogorov-Smirnov across all sample sizes and distribution types tested. Use Kolmogorov-Smirnov only when sample sizes are very large (N > 2000) or when software constraints leave no alternative.
Anderson-Darling vs. Jarque-Bera
Anderson-Darling and Jarque-Bera approach the normality question from different angles. Anderson-Darling modifies the Kolmogorov-Smirnov approach by applying quadratic weights to the distribution tails — squared deviations at the extremes count more. This makes it particularly well-suited to analyses where tail behavior matters, such as financial risk modeling, where extreme events have outsized consequences.
Jarque-Bera takes a completely different approach: it tests whether the sample's skewness and excess kurtosis match the values expected under normality (both equal to zero). It does not examine the full distribution shape directly, only two summary moments. This makes it fast and analytically convenient, but it will miss non-normality patterns that do not manifest in skewness or kurtosis — for example, a bimodal distribution with symmetric, moderate-kurtosis shape can fool it entirely. Jarque-Bera is most appropriate for large econometric time series datasets where computational speed matters and skewness/kurtosis departures are the primary concern.
Visual Diagnostics vs. Formal Statistical Tests
This is the most practically important comparison for applied researchers. Formal tests — Shapiro-Wilk, Anderson-Darling, and the others — produce an objective p-value that simplifies reporting and removes subjectivity from the decision. However, they have well-documented size problems: underpowered for small samples (may miss real non-normality) and overpowered for large samples (will flag trivially small deviations as significant). This means the p-value alone can mislead in both directions.
Q-Q plots and histograms provide context about how and where the distribution deviates from normal, which helps you judge whether the deviation will actually distort your analysis. A researcher seeing a Q-Q plot where all points fall within ±0.3 of the reference line can reasonably proceed with a parametric test even if Shapiro-Wilk returned p = 0.04 — the formal test's significance does not translate directly to practical impact. The best practice is to report both and let the reader see the full picture.
Multivariate Normality Tests
When working with multi-dimensional datasets in MANOVA, structural equation modeling (SEM), or discriminant analysis, univariate normality testing is not sufficient. Even if every individual variable looks normally distributed on its own, their joint distribution can still violate multivariate normality assumptions.
Mardia's Test
Evaluates multivariate skewness and kurtosis separately. Requires large sample sizes (typically N > 200 per variable) to maintain reliable power. Available in R through the MVN package.
Henze-Zirkler Test
Measures distance between the empirical characteristic function and the theoretical multivariate normal. Stable performance across a range of sample sizes. Preferred for medium-sized multivariate datasets.
Royston's Test
Combines individual Shapiro-Wilk transforms into a single multivariate score. Works well for smaller multi-dimensional datasets (N < 200 per variable). Available in R through the MVN and mvnormtest packages.
Common Misconceptions About Normality Testing
| Misconception | Incorrect Interpretation | Correct Interpretation |
|---|---|---|
| Non-significant result proves normality | p > 0.05 means the data IS normal | p > 0.05 means there is not enough evidence to reject normality — the data is consistent with normal, which is different from proving it |
| Test raw Y in regression | Check if the outcome variable is normally distributed | Check if the model residuals are normally distributed — the assumption applies to errors, not outcomes |
| Formal test is always definitive | If Shapiro-Wilk says significant, stop the parametric analysis | Always pair the formal test with a Q-Q plot — a significant p-value with a nearly straight Q-Q may still support parametric use |
| KS test is as good as Shapiro-Wilk | Both tests in the SPSS output are equally reliable | Shapiro-Wilk has substantially higher power for most sample sizes; the KS test should be a secondary reference at most |
| Large N makes normality irrelevant | With 500+ observations, I do not need to check normality | Large N makes formal tests hypersensitive, but regression residuals still need checking because normality affects confidence interval validity at all sample sizes |
Normality Testing: Frequently Asked Questions
FAQ
Does a non-significant normality test (p > 0.05) prove my data is perfectly normal?
No. A non-significant p-value means there is insufficient evidence to reject the null hypothesis of normality — it does not confirm that the population is exactly normal. The distinction matters particularly for small samples where the test has low power and may miss genuine non-normality. Always supplement the p-value with a visual check.
FAQ
Why does the Kolmogorov-Smirnov test flag almost everything as non-normal in large datasets?
As sample sizes increase, all formal statistical tests gain power. With N = 1000, even a deviation from normality too small to matter practically will produce a p-value below 0.05. This is a feature of hypothesis testing logic, not a flaw in the data. For large samples, the Q-Q plot tells you more about whether the deviation matters than the formal test p-value does.
FAQ
Can I run a normality test on Likert-scale or binary data?
Normality tests apply to continuous numeric data only. Binary data (0/1) or ordinal Likert responses (1–5) violate the continuous distribution assumption underlying Shapiro-Wilk and all similar tests. For these data types, use appropriate nonparametric or categorical methods rather than checking normality first.
FAQ
How many data points do I need for Shapiro-Wilk to be reliable?
The Shapiro-Wilk test is defined for sample sizes from N = 3 to N = 2000, though its power is genuinely limited below N = 10 — the test will rarely reject H₀ even for clearly non-normal data with very small samples. For N between 10 and 30, use Shapiro-Wilk but weight the Q-Q plot heavily in your decision. For N > 2000, switch to Anderson-Darling.
FAQ
Is normality required for simple linear regression?
The normality assumption in simple linear regression applies to the residuals (error terms), not to the predictor X or the outcome Y individually. OLS coefficient estimates are unbiased whether or not residuals are normal — normality becomes important for the validity of t-tests on coefficients, confidence intervals, and prediction intervals. In large samples, the CLT makes these valid even without perfect residual normality.
Formula Glossary
| Term | Definition | Formula / Key Value | Interpretation Guide |
|---|---|---|---|
| Shapiro-Wilk (W) | Goodness-of-fit test using regression of ordered statistics against expected normal order statistics | W = (Σ aᵢ x(ᵢ))² / Σ(xᵢ − x̄)² | W ranges from 0 to 1. Values close to 1 support normality. Small p-values indicate significant departure. |
| Anderson-Darling (A²) | Weighted modification of KS test emphasizing distribution tails | A² = −N − Σ(2i−1)/N × [ln F(xᵢ) + ln(1−F(xₙ₊₁₋ᵢ))] | Larger A² values indicate greater deviation from normality. Compare to critical values for significance. |
| Skewness (S) | Measure of distribution asymmetry around the mean | S = [Σ(xᵢ−x̄)³/n] / [Σ(xᵢ−x̄)²/n]^(3/2) | S = 0: perfectly symmetric. S > 0: right tail. S < 0: left tail. |S| > 1: meaningful skew. |
| Excess Kurtosis (K) | Measure of tail weight relative to a normal distribution | K = [Σ(xᵢ−x̄)⁴/n] / [Σ(xᵢ−x̄)²/n]² − 3 | K = 0: normal tails. K > 0: heavy tails (leptokurtic). K < 0: light tails (platykurtic). |
| Jarque-Bera (JB) | Asymptotic test combining skewness and kurtosis departures | JB = (N/6) × [S² + (K/4)²] | JB ~ χ²(2) under H₀. Large JB values reject normality. Best for N > 300. |
| Q-Q Plot | Graphical comparison of empirical vs. theoretical quantiles | x: theoretical normal quantiles, y: observed data quantiles | Points on diagonal: normal. S-curve: kurtosis issue. Curved ends: skew. Scattered points: non-normal. |
Related Topics on Statistics Fundamentals
Normality testing is one piece of the broader statistical workflow. These pages from Statistics Fundamentals cover the analysis steps that come before and after normality assessment.
Hypothesis Testing
The complete framework for statistical inference — p-values, significance levels, decision rules, and test selection.
Simple Linear Regression
Where residual normality is checked — includes slope, intercept, R-squared, and residual diagnostics.
ANOVA
Compares means across multiple groups — normality within each group is a core assumption.
Normal Distribution
The theoretical foundation — mean, standard deviation, z-scores, and the empirical rule.
Central Limit Theorem
Why large samples often make normality testing less critical — and when it still matters.
Outliers
Extreme values are a primary driver of non-normality — detection and treatment methods.
Q-Q Plots
The visual companion to formal normality tests — construction, interpretation, and examples.
Bootstrap Sampling
Distribution-free confidence intervals for when parametric assumptions cannot be met.