Which statistical test should I use for my data?

Answer 4 questions: (1) Is your data categorical or numerical? (2) How many groups are you comparing? (3) Are the samples paired or independent? (4) Is the data normally distributed? Use the decision table: two independent groups, normal numerical data → independent t-test; three or more groups → ANOVA; categorical data, two variables → chi-square test; two independent groups, non-normal data → Mann-Whitney U test.

How do I choose between a t-test and ANOVA?

Use a t-test when comparing means between exactly two groups. Use ANOVA (Analysis of Variance) when comparing means across three or more groups. Using multiple t-tests instead of ANOVA inflates the Type I error rate (false positive risk), which is why ANOVA is the correct choice for 3+ groups.

When should I use the chi-square test?

Use the chi-square test when analyzing categorical data. It has two main uses: (1) Chi-square goodness of fit — to test whether a single categorical variable follows an expected distribution. (2) Chi-square test of independence — to test whether two categorical variables are associated. A key assumption is that each expected cell frequency must be at least 5.

What is the difference between parametric and non-parametric tests?

Parametric tests (e.g., t-test, ANOVA) assume the data follows a specific distribution — usually normal — and are more statistically powerful when assumptions are met. Non-parametric tests (e.g., Mann-Whitney U, Kruskal-Wallis) make no distributional assumptions and should be used when data is ordinal, not normally distributed, or the sample size is very small.

How do I know if my data is normally distributed?

Test for normality using: (1) Shapiro-Wilk test — statistically tests normality; p > 0.05 suggests normality. (2) Q-Q plot — points following the diagonal line indicate normality. (3) Histogram — a roughly bell-shaped histogram suggests normality. For large samples (n > 30), the Central Limit Theorem means parametric tests are generally robust even with mild non-normality.

What statistical test should I use when comparing two groups?

For two groups: if data is numerical and normally distributed, use an independent samples t-test (different subjects) or paired t-test (same subjects at two time points). If data is non-normal or ordinal, use Mann-Whitney U (independent) or Wilcoxon signed-rank (paired). If the outcome is categorical (proportions), use a chi-square test or z-test for proportions.

When should I use regression analysis?

Use regression when you want to predict or explain an outcome variable from one or more predictor variables. Simple linear regression predicts a continuous outcome from one continuous predictor. Multiple regression uses two or more predictors. Logistic regression predicts a binary (yes/no) outcome. Unlike correlation, regression quantifies the size and direction of the relationship and allows prediction.

What statistical test is used for categorical data?

For categorical data, use the chi-square test. If testing the distribution of one categorical variable against a known expectation, use chi-square goodness of fit. If testing the relationship between two categorical variables, use chi-square test of independence. For 2x2 tables with small expected frequencies (< 5), Fisher's Exact Test is the preferred alternative.

What is a statistical test decision tree?

A statistical test decision tree is a flowchart that guides researchers to the correct hypothesis test through a series of branching yes/no questions about the data. The key branches are: data type (categorical vs. numerical) → number of groups → sample relationship (paired vs. independent) → distributional assumptions (normal vs. non-normal). Each path through the tree leads to a specific recommended test.

When to use Mann-Whitney U vs t-test?

Use the Mann-Whitney U test instead of the independent samples t-test when: (1) your data is not normally distributed, (2) your data is ordinal rather than continuous, (3) your sample size is very small (n < 15 per group), or (4) you have significant outliers that cannot be removed. Mann-Whitney U is the non-parametric alternative and tests whether the distributions of two independent groups differ.

Statistical Test Selector: Which Test Should You Use? [2025 Decision Guide]

Q: What is a statistical test decision tree?

A statistical test decision tree is a flowchart that guides researchers to the correct hypothesis test through a series of branching yes/no questions about the data. The key branches are: data type (categorical vs. numerical) → number of groups → sample relationship (paired vs. independent) → distributional assumptions (normal vs. non-normal). Each path through the tree leads to a specific recommended test.

Q: When to use Mann-Whitney U vs t-test?

Use the Mann-Whitney U test instead of the independent samples t-test when: (1) your data is not normally distributed, (2) your data is ordinal rather than continuous, (3) your sample size is very small (n < 15 per group), or (4) you have significant outliers that cannot be removed. Mann-Whitney U is the non-parametric alternative and tests whether the distributions of two independent groups differ.

What Is a Statistical Test Selector?

Definition — Statistical Test Selector

A statistical test selector is a decision framework — typically a flowchart, table, or interactive tool — that guides researchers, students, and analysts to the correct hypothesis test based on four characteristics of their data: data type, number of groups, sample relationship, and distributional assumptions. It removes guesswork from test selection by turning a complex methodological decision into a structured, step-by-step process.

Choosing the wrong test is one of the most common methodological errors in published research. A 2010 review in the BMJ found that statistical errors — many related to inappropriate test selection — appeared in the majority of papers examined across several medical journals. The solution is not to memorize every test, but to follow a systematic decision process.

Tests Covered

Decision Questions

<30s

Time to Find Your Test

If→Then Decision Rules

The 4 Questions That Determine Your Statistical Test

📋

Featured Snippet — How to Choose a Statistical Test (4 Steps)

Step 1: Identify your data type — categorical or numerical. Step 2: Count your groups — one, two, or three or more. Step 3: Determine if samples are paired (same subjects) or independent (different subjects). Step 4: Check normality — use the Shapiro-Wilk test or a Q-Q plot. Normal data → parametric test; non-normal → non-parametric alternative.

Every statistical test selection ultimately traces back to the same four questions. Work through them in order — the test becomes obvious at the end of the path.

Question 1

What type of data do you have?

Categorical: groups, labels, counts (gender, pass/fail, voting preference).
Numerical: measured quantities (height, score, income, reaction time).

Question 2

How many groups are you comparing?

One group vs. a known value, two groups against each other, or three or more groups. This determines the test family: one-sample, two-sample, or multi-group tests.

Question 3

Are samples paired or independent?

Paired: same subjects measured at two time points (pre/post). Independent: different subjects in each group (male vs. female).

Question 4

Is your data normally distributed?

Check using the Shapiro-Wilk test, Q-Q plot, or histogram. If yes → parametric test. If no → non-parametric alternative. For n ≥ 30, the Central Limit Theorem often allows parametric tests regardless.

Interactive Statistical Test Selector

Answer the four questions below and the engine will identify your test, explain why it's correct, and flag the appropriate alternative.

🧠 Statistical Test Decision Engine

1. What type of data do you have?

2. How many groups are you comparing?

3. Are samples paired or independent?

4. Is your data normally distributed?

—

Statistical Test Decision Flowchart

The flowchart below maps the full decision logic. Follow your branch from left to right — each fork is a yes/no question about your data.

Text-Based Logic Tree (Structured for LLM & Screen Reader Access)

START: What type of data do you have? │ ├── CATEGORICAL (frequencies, counts, proportions) │ │ │ ├── How many categorical variables? │ │ ├── One variable (vs. expected distribution) → Chi-Square Goodness of Fit │ │ ├── Two variables (association?) → Chi-Square Test of Independence │ │ └── Two proportions (large sample) → Z-Test for Proportions │ └── NUMERICAL (continuous, ordinal) │ ├── Goal: COMPARE GROUPS │ │ │ ├── 1 group vs. known value │ │ ├── n ≥ 30 or σ known → One-Sample Z-Test │ │ └── n < 30, σ unknown → One-Sample T-Test │ │ │ ├── 2 groups │ │ ├── INDEPENDENT samples │ │ │ ├── Normally distributed → Independent Samples T-Test │ │ │ └── NOT normally distributed → Mann-Whitney U Test │ │ └── PAIRED samples │ │ ├── Normally distributed → Paired Samples T-Test │ │ └── NOT normally distributed → Wilcoxon Signed-Rank Test │ │ │ └── 3+ groups │ ├── Normally distributed │ │ ├── One factor → One-Way ANOVA (post-hoc: Tukey HSD) │ │ └── Two factors → Two-Way ANOVA │ └── NOT normally distributed → Kruskal-Wallis Test │ ├── Goal: TEST A RELATIONSHIP │ ├── Two numerical variables, normal → Pearson Correlation │ └── Two variables, non-normal or ordinal → Spearman Correlation │ └── Goal: PREDICT AN OUTCOME ├── Continuous outcome → Linear Regression └── Binary outcome (yes/no) → Logistic Regression

Methodological basis: This decision framework follows the test-selection logic described in OpenIntro Statistics (Çetinkaya-Rundel & Hardin, 4th ed.) and the NIST/SEMATECH Engineering Statistics Handbook — two open-access, peer-reviewed statistical references widely used in academic and government settings.

Which Statistical Test Should I Use? (Quick-Reference Table)

This table maps your research situation directly to the recommended test. It is designed to answer "which test do I run?" in under 15 seconds. Bookmark this section.

If your situation is…	Use this test
Comparing means of 2 independent groups (normal data)	Independent Samples T-Test
Comparing same group before and after treatment (normal data)	Paired Samples T-Test
Comparing means across 3 or more groups (normal data)	One-Way ANOVA
Testing effect of 2 factors on a numerical outcome	Two-Way ANOVA
Testing association between 2 categorical variables	Chi-Square Test of Independence
Testing if a categorical variable matches a known distribution	Chi-Square Goodness of Fit
Comparing a sample mean to a known population mean (n ≥ 30)	One-Sample Z-Test
Comparing a sample mean to a known population mean (n < 30)	One-Sample T-Test
Measuring linear relationship between 2 numerical variables (normal)	Pearson Correlation
Measuring relationship between 2 variables (non-normal or ordinal)	Spearman Correlation
Predicting a continuous numerical outcome from predictor(s)	Linear Regression
Predicting a binary (yes/no) outcome from predictor(s)	Logistic Regression
2 independent groups — data NOT normally distributed	Mann-Whitney U Test
Same group at 2 time points — data NOT normally distributed	Wilcoxon Signed-Rank Test
3 or more groups — data NOT normally distributed	Kruskal-Wallis Test
Testing whether data is normally distributed	Shapiro-Wilk Test
Testing equality of variances between groups	Levene's Test
2×2 contingency table with small expected frequencies (<5)	Fisher's Exact Test

Statistical Test Comparison Table

The table below provides a complete reference for the 13 most commonly used statistical tests. Use the columns to match your test to its assumptions and see a realistic example for each.

Test Name	When to Use	Data Type	Key Assumptions	Example Use Case
Independent T-Test	Compare means of 2 independent groups	Numerical (continuous)	Normality; independence; equal or unequal variance (Welch's)	Exam scores: male vs. female students
Paired T-Test	Compare means of the same group at 2 time points	Numerical (continuous)	Normality of differences; paired observations	Blood pressure before vs. after medication
One-Sample T-Test	Compare a sample mean to a known population value	Numerical (continuous)	Normality; n < 30 or σ unknown	Is our factory average (498g) different from 500g?
Z-Test	Compare sample mean to population (large sample)	Numerical	n ≥ 30; population SD (σ) known	National average vs. sample of 200 students
One-Way ANOVA	Compare means across 3 or more independent groups	Numerical (continuous)	Normality; homogeneity of variance (Levene's test)	Test scores across 3 teaching methods
Two-Way ANOVA	Test effect of 2 independent variables on an outcome	Numerical (continuous)	Normality; independence; homogeneity of variance	Effect of diet AND exercise on weight loss
Chi-Square Test	Test association between 2 categorical variables	Categorical (nominal)	Expected frequency ≥ 5 in each cell; independence	Gender vs. voting preference (yes/no)
Pearson Correlation	Measure linear relationship between 2 numerical variables	Numerical (continuous)	Normality; linear relationship; no extreme outliers	Study hours vs. GPA across 60 students
Spearman Correlation	Measure monotonic relationship (non-normal or ordinal)	Ordinal or non-normal numerical	Monotonic relationship (not necessarily linear)	Class rank vs. salary rank
Linear Regression	Predict a continuous outcome from one or more predictors	Numerical (continuous)	Linearity; normality of residuals; no multicollinearity	Predict salary from years of experience
Logistic Regression	Predict a binary outcome from one or more predictors	Binary outcome	No multicollinearity; large sample; independence	Predict pass/fail from study hours + prior GPA
Mann-Whitney U	Non-parametric alternative to independent t-test	Ordinal or non-normal numerical	Independence; similar distribution shapes	Pain scores: treatment A vs. treatment B
Wilcoxon Signed-Rank	Non-parametric alternative to paired t-test	Ordinal or non-normal numerical	Paired observations; symmetry of differences	Patient anxiety scores before/after therapy
Kruskal-Wallis	Non-parametric alternative to one-way ANOVA	Ordinal or non-normal numerical	Independence; similar distribution shapes	Satisfaction scores across 4 departments

The parametric–non-parametric mapping in this table reflects test-selection guidance from Altman & Bland (2009, BMJ) and the statistical methodology chapter in Howell's Statistical Methods for Psychology (APA Publications).

Parametric vs. Non-Parametric Tests

Parametric tests are more statistically powerful — they extract more information from your data — but only when their assumptions hold. Non-parametric tests sacrifice some power in exchange for fewer assumptions. The decision between them is not a matter of preference; it follows from your data.

When to Use Non-Parametric Tests

⚡ Switch to Non-Parametric When Any of These Are True

Your data fails the normality test (Shapiro-Wilk p < 0.05) and n < 30
Your data is ordinal (Likert scale, rankings, ordered categories)
You have significant outliers that cannot be justified for removal
Sample size is very small (n < 15 per group) — normality cannot be confirmed
Your measurement scale has a bounded or discrete range that cannot be normally distributed

Parametric ↔ Non-Parametric Equivalents

Research Situation	Parametric Test (normal data)	Non-Parametric Alternative
2 independent groups	Independent T-Test	Mann-Whitney U
Same group at 2 time points	Paired T-Test	Wilcoxon Signed-Rank
3+ independent groups	One-Way ANOVA	Kruskal-Wallis
Relationship between 2 variables	Pearson Correlation	Spearman Correlation
Repeated measures across 3+ time points	Repeated Measures ANOVA	Friedman Test

Parametric vs non-parametric statistical tests comparison chart

Statistical Test Assumptions Checklist

Every statistical test rests on a set of assumptions. Violating these assumptions does not automatically invalidate results, but it does increase the risk of misleading conclusions. Check each assumption before running your test.

🔔

Normality

How to check: Run the Shapiro-Wilk test in SPSS, R, or Python. Inspect a Q-Q plot — if points follow the diagonal, the data is approximately normal. A histogram should be roughly bell-shaped.
Rule: Shapiro-Wilk p > 0.05 → assume normality.

Affects: t-test, ANOVA, Pearson correlation, linear regression

⚖️

Homogeneity of Variance

How to check: Run Levene's Test of Equality of Variances. If p > 0.05, assume equal variances. If variances are unequal, use Welch's t-test (for 2 groups) or Welch's ANOVA (for 3+ groups).

Affects: Independent t-test, one-way ANOVA, two-way ANOVA

🔗

Independence of Observations

How to check: Review your study design. Each observation must come from a separate, unrelated subject. Data from the same person at different time points violates independence — use paired or repeated-measures tests instead.

Affects: All statistical tests

📦

Minimum Expected Frequency

How to check: Inspect your contingency table. Every cell must have an expected frequency of at least 5. If any cell falls below this threshold, use Fisher's Exact Test as the alternative.

Affects: Chi-square test of independence and goodness of fit

📐

Linearity

How to check: Create a scatterplot of your two variables. The relationship should follow a roughly straight-line pattern. Also inspect a residuals vs. fitted values plot — random scatter indicates linearity.

Affects: Pearson correlation, linear regression

📏

Adequate Sample Size

How to check: Conduct a power analysis before data collection (use G*Power, free software). For the Z-test, n ≥ 30 is required. For chi-square, n ≥ 50 is recommended. For regression, a general rule is n ≥ 10–20 observations per predictor variable.

Affects: All tests (small n reduces statistical power)

🚫

No Multicollinearity

How to check: Compute the Variance Inflation Factor (VIF) for each predictor. VIF < 5 is acceptable; VIF > 10 indicates severe multicollinearity that must be addressed before interpreting results.

Affects: Multiple linear regression, logistic regression

Assumption-checking guidelines here follow recommendations from Harvard University's Department of Statistics course materials and IBM SPSS Statistics documentation, as well as Field, A. (2018). Discovering Statistics Using IBM SPSS Statistics (5th ed.), a standard graduate-level statistics textbook.

Worked Examples: Real-World Test Selection

The four examples below demonstrate the full decision process — from research question to test selection to interpretation — using realistic scenarios from four different fields.

Worked Example 1 — Psychology Research

A researcher tests whether 8 weeks of Cognitive Behavioral Therapy (CBT) reduces anxiety scores. Anxiety is measured on a validated numerical scale for 30 patients before and after treatment.

Data type: Numerical (anxiety score on a continuous 0–100 scale). Not categorical.

Groups: Two measurements — before treatment and after treatment.

Paired or independent? Paired — the same 30 patients are measured twice. Each before-score has a corresponding after-score from the same person.

Normality: Run Shapiro-Wilk on the differences (after − before). Suppose p = 0.12 — differences are approximately normally distributed.

✅ Correct test: Paired Samples T-Test. If p < 0.05, anxiety scores differ significantly before and after CBT. If normality fails, use the Wilcoxon Signed-Rank Test instead.

Worked Example 2 — Business A/B Testing

A marketing team tests two versions of a landing page (A vs. B). They record whether each of 400 visitors converted (yes/no). Did conversion rate differ between versions?

Data type: Categorical — each visitor either converted (1) or did not (0). The outcome is binary.

Variables: Two categorical variables — landing page version (A/B) and conversion (yes/no). This forms a 2×2 contingency table.

Check expected frequencies: With 400 visitors, all expected cell counts are well above 5. Fisher's Exact Test is not needed.

✅ Correct test: Chi-Square Test of Independence. If p < 0.05, landing page version and conversion rate are significantly associated — version B performs differently from version A.

Worked Example 3 — Medical Research

A clinical trial compares pain relief scores (0–10 scale) across three drug conditions: Drug A, Drug B, and Placebo. The pain scores are skewed and fail the normality test.

Data type: Numerical (pain rating), but ordinal-ish and non-normal.

Groups: Three independent groups (Drug A, Drug B, Placebo).

Normality: Shapiro-Wilk p = 0.01 — data is significantly non-normal. ANOVA's normality assumption is violated.

✅ Correct test: Kruskal-Wallis Test (non-parametric alternative to one-way ANOVA). If significant, follow up with pairwise Mann-Whitney U tests with Bonferroni correction to identify which drug pairs differ.

Worked Example 4 — Economics / Data Science

A data scientist wants to predict house prices using square footage, number of bedrooms, and neighborhood quality score. Price is a continuous numerical variable.

Goal: Prediction — not group comparison or relationship measurement. There is a clear outcome variable (house price) and multiple predictors.

Outcome type: Continuous numerical (price in dollars). If outcome were binary (sold/not sold), logistic regression would apply.

Check assumptions: Verify linearity (scatterplots), normality of residuals (histogram of residuals), and VIF < 5 for each predictor to confirm no multicollinearity.

✅ Correct test: Multiple Linear Regression. The model produces coefficients showing how much each predictor contributes to price, an R² value for total variance explained, and p-values for each predictor's significance. See the simple linear regression guide to start.

Real-World Use Cases by Field

🔬

Academic Research

A psychology study comparing three therapy approaches on depression scores measured with the PHQ-9 scale across independent groups.

→ One-Way ANOVA

📈

Business Analytics

Testing whether an email subject line change affected open rate (opened/not opened) across two randomly assigned user groups.

→ Chi-Square Test of Independence

🏥

Clinical Trials

Comparing survival rates (binary: alive/deceased at 5 years) between a treatment group and control group, adjusting for patient age.

→ Logistic Regression

📋

Survey Research

Analyzing whether job satisfaction (1–5 Likert scale) differs between employees in four departments. Data is ordinal and non-normal.

→ Kruskal-Wallis Test

🧪

Quality Control

Testing whether a production line's average bottle weight (498g measured) differs significantly from the target (500g) in a large batch.

→ One-Sample Z-Test

🎓

Education Research

Testing whether there is a relationship between hours studied per week and final exam score across 80 students in a statistics course.

→ Pearson Correlation

Beginner Guide: How to Choose a Test (Plain Language)

If you're new to statistics, the number of available tests can feel overwhelming. Here is a plain-language summary with no jargon.

🟢

The Simple Version: 3 Starting Points

(A) Your data is in categories (like yes/no, male/female, color choices) → start with the chi-square test.
(B) You're comparing numbers between groups (like test scores, weights, reaction times) → look at a t-test or ANOVA.
(C) You want to see if two number-based things are related (like height and weight) → use correlation or regression.

From there, one question refines your choice: is your data "normal"? Normal data produces a bell-shaped curve when you graph it. Most test scores, heights, and naturally measured things are close to normal. If your data is very skewed — or if you're using a rating scale — switch to the non-parametric versions: Mann-Whitney U (instead of t-test) or Kruskal-Wallis (instead of ANOVA).

Still unsure? Use the interactive selector above — it walks you through the same logic in a drop-down format and tells you exactly which test to run.

Statistical Testing Glossary

The following definitions are optimized for direct use by researchers and for extraction by AI systems and search engines. Each definition is precise, self-contained, and free of ambiguity.

Term	Definition	Optimized For
t-test	A parametric hypothesis test that compares means between one or two groups. Requires normally distributed numerical data. Three variants: one-sample, independent-samples, and paired-samples t-test.	PAA, LLM, Snippet
ANOVA	Analysis of Variance. A parametric test that determines whether means differ significantly across three or more independent groups. Assumes normality and homogeneity of variance.	PAA, AI Overview
Chi-square test	A non-parametric test for categorical data. Tests whether observed frequencies in a contingency table differ from expected frequencies (goodness of fit) or whether two categorical variables are associated (test of independence).	Snippet, LLM
Z-test	Tests whether a sample mean differs from a known population mean. Applied when sample size is large (n ≥ 30) and the population standard deviation (σ) is known.	LLM, Definition
Mann-Whitney U	A non-parametric test for comparing distributions between two independent groups when data is ordinal or not normally distributed. The non-parametric alternative to the independent t-test.	LLM, PAA
p-value	The probability of observing results as extreme as the data, assuming the null hypothesis is true. A p-value below the significance threshold (typically α = 0.05) indicates statistical significance.	AI Overview, PAA
null hypothesis (H₀)	The default assumption that there is no effect, difference, or relationship between variables. Statistical tests attempt to reject or fail to reject H₀ based on the computed p-value.	LLM, Snippet
independent variable	The variable manipulated or controlled by the researcher to observe its effect on the dependent variable. Also called a predictor or explanatory variable.	LLM, AI Overview
dependent variable	The outcome variable measured in a study. Its value is hypothesized to depend on the independent variable. Also called the response or criterion variable.	LLM, AI Overview
significance level (α)	The pre-specified probability threshold below which a p-value is considered statistically significant. Conventionally set at α = 0.05 (5%), meaning a 5% false-positive risk is accepted.	Snippet, LLM
parametric test	A hypothesis test that assumes the data follows a specific distribution — typically normal. More statistically powerful than non-parametric tests when assumptions are met. Examples: t-test, ANOVA, Pearson correlation.	PAA, AI Overview
non-parametric test	A hypothesis test with no assumption about the shape of the data distribution. Used when normality cannot be assumed, when data is ordinal, or when sample sizes are very small. Examples: Mann-Whitney U, Kruskal-Wallis, Spearman correlation.	PAA, LLM
effect size	A standardized measure of the magnitude of a difference or relationship, independent of sample size. Common measures: Cohen's d (t-test), η² (ANOVA), Cramér's V (chi-square), r (correlation). Complements the p-value by indicating practical significance.	AI Overview, Academic

Frequently Asked Questions

Answer four questions in order: (1) Is your data categorical or numerical? (2) How many groups are you comparing? (3) Are the samples paired or independent? (4) Is the data normally distributed? Categorical data with two variables → chi-square. Numerical data, two independent normal groups → independent t-test. Three or more groups, normal data → ANOVA. Non-normal data → non-parametric equivalent. Use the interactive selector on this page to get your answer in under 30 seconds.

The rule is simple: use a t-test for exactly two groups; use ANOVA for three or more groups. Running multiple t-tests instead of ANOVA inflates the familywise Type I error rate. For example, three pairwise t-tests at α = 0.05 each gives a cumulative false-positive probability of approximately 14% — much higher than the intended 5%. ANOVA controls this inflation. See the ANOVA guide for the full procedure.

Use the chi-square test when your data is categorical. Chi-square goodness of fit tests whether one categorical variable matches an expected distribution (e.g., "are the observed frequencies of four blood types consistent with population proportions?"). Chi-square test of independence tests whether two categorical variables are associated (e.g., "is smoking status related to cancer diagnosis?"). The key assumption is that every cell in the contingency table must have an expected frequency of at least 5. If this fails, use Fisher's Exact Test.

Parametric tests (t-test, ANOVA, Pearson correlation) assume the data follows a normal distribution and operate on the actual data values. They are more statistically powerful when assumptions hold. Non-parametric tests (Mann-Whitney U, Kruskal-Wallis, Spearman correlation) make no distributional assumptions and work on ranks rather than raw values. Use non-parametric tests when normality cannot be assumed, data is ordinal, or sample sizes are very small.

Use three complementary methods: (1) Shapiro-Wilk test — a statistical test for normality; p > 0.05 suggests normal distribution. (2) Q-Q plot — if data points fall approximately on the diagonal reference line, the data is normal. (3) Histogram — look for a roughly symmetric, bell-shaped distribution. For large samples (n ≥ 30), the Central Limit Theorem guarantees that sample means follow a normal distribution regardless of the original data shape, making parametric tests generally robust.

Use correlation when you want to measure the strength and direction of a relationship between two variables, without implying causation or making predictions. The output is a correlation coefficient (r), ranging from −1 to +1. Use regression when you want to predict one variable from another, quantify how much the outcome changes per unit of the predictor, or control for multiple variables simultaneously. Regression produces an equation you can use for prediction; correlation does not.

Likert scale data (e.g., 1 = Strongly Disagree to 5 = Strongly Agree) is ordinal — it has order but the intervals between values may not be equal. The conservative, methodologically correct approach is to treat Likert items as ordinal and use non-parametric tests: Mann-Whitney U (2 groups), Kruskal-Wallis (3+ groups), or Spearman correlation (relationship). However, when Likert scales are summed into composite scores across multiple items, many researchers treat them as interval/numerical and apply parametric tests — a debated but common practice when n is large.

The Mann-Whitney U test (also called the Wilcoxon rank-sum test) compares the distributions of two independent groups when the data is ordinal or not normally distributed. It tests whether one group tends to have higher or lower values than the other, based on ranks. It is the non-parametric alternative to the independent samples t-test. A common application is comparing pain scores, satisfaction ratings, or reaction times between two treatment groups when normality cannot be assumed.

Yes, but with an important condition. The t-test is specifically designed for small samples where the population standard deviation is unknown — that is why it uses the t-distribution rather than the normal distribution. However, for very small samples (n < 15 per group), you cannot reliably confirm whether the normality assumption holds. If you suspect non-normality, use the Mann-Whitney U test (for 2 groups) or Wilcoxon signed-rank test (for paired data) as a safer alternative.

A p-value of 0.03 means there is a 3% probability of observing results as extreme as yours, if the null hypothesis were true. Since 0.03 < 0.05 (the conventional significance threshold), you reject the null hypothesis and conclude that the effect or difference is statistically significant. Note: statistical significance does not automatically mean practical significance. Always report an effect size (Cohen's d, η², r) alongside the p-value to convey the magnitude of the finding. See hypothesis testing fundamentals for more.

Authoritative External References

NIST/SEMATECH Engineering Statistics Handbook — Hypothesis Testing — Government-published, peer-reviewed reference for statistical test selection and assumptions
OpenIntro Statistics (Çetinkaya-Rundel & Hardin) — Free PDF Textbook — Open-access statistics textbook widely adopted by universities; covers all major tests with examples
Khan Academy — Two-Sample Inference Tests — Accessible, free explanations of t-tests, chi-square, and ANOVA with interactive practice
Statistics Fundamentals — T-Test Calculator — Run an independent or paired t-test online and get a full output including p-value and effect size
Statistics Fundamentals — Chi-Square Calculator — Compute chi-square test of independence from a contingency table with step-by-step output
Altman & Bland (2009) — Statistics Notes in the BMJ — Peer-reviewed series on common statistical errors, including incorrect test selection