Hypothesis Testing Test Selection Decision Tool 22 min read May 11, 2026
BY: Statistics Fundamentals Team
Reviewed By: Minsa A (Senior Statistics Editor)

Statistical Test Selector: Which Statistical Test Should I Use?

You have data. You have a research question. Now comes the question every student and researcher faces: which statistical test do I run? The wrong choice can invalidate an entire study. This page solves that problem with an interactive decision engine, a logic-based flowchart, and a complete comparison table covering 13 tests — so you can find the right test in under 30 seconds.

The decision comes down to four questions about your data. Answer them in order, and the correct test becomes obvious. This is the same framework taught in graduate-level research methods courses and referenced in peer-reviewed methodology guides from institutions like Harvard's Department of Statistics and the NIST/SEMATECH Engineering Statistics Handbook.

What You'll Find on This Page
  • ✓ Interactive test selector — answer 4 questions, get your test instantly
  • ✓ Full decision flowchart (logic tree) for all major tests
  • ✓ Comparison table: 13 tests with data type, assumptions, and examples
  • ✓ "If X → Use Y" quick-reference table (15 rules, Featured Snippet ready)
  • ✓ 4 worked examples from real-world research scenarios
  • ✓ Parametric vs. non-parametric comparison with switching guide
  • ✓ Assumptions checklist: normality, variance equality, sample size

What Is a Statistical Test Selector?

Definition — Statistical Test Selector
A statistical test selector is a decision framework — typically a flowchart, table, or interactive tool — that guides researchers, students, and analysts to the correct hypothesis test based on four characteristics of their data: data type, number of groups, sample relationship, and distributional assumptions. It removes guesswork from test selection by turning a complex methodological decision into a structured, step-by-step process.

Choosing the wrong test is one of the most common methodological errors in published research. A 2010 review in the BMJ found that statistical errors — many related to inappropriate test selection — appeared in the majority of papers examined across several medical journals. The solution is not to memorize every test, but to follow a systematic decision process.

13
Tests Covered
4
Decision Questions
<30s
Time to Find Your Test
15
If→Then Decision Rules

The 4 Questions That Determine Your Statistical Test

📋
Featured Snippet — How to Choose a Statistical Test (4 Steps)

Step 1: Identify your data type — categorical or numerical. Step 2: Count your groups — one, two, or three or more. Step 3: Determine if samples are paired (same subjects) or independent (different subjects). Step 4: Check normality — use the Shapiro-Wilk test or a Q-Q plot. Normal data → parametric test; non-normal → non-parametric alternative.

Every statistical test selection ultimately traces back to the same four questions. Work through them in order — the test becomes obvious at the end of the path.

Question 1

What type of data do you have?

Categorical: groups, labels, counts (gender, pass/fail, voting preference).
Numerical: measured quantities (height, score, income, reaction time).

Question 2

How many groups are you comparing?

One group vs. a known value, two groups against each other, or three or more groups. This determines the test family: one-sample, two-sample, or multi-group tests.

Question 3

Are samples paired or independent?

Paired: same subjects measured at two time points (pre/post). Independent: different subjects in each group (male vs. female).

Question 4

Is your data normally distributed?

Check using the Shapiro-Wilk test, Q-Q plot, or histogram. If yes → parametric test. If no → non-parametric alternative. For n ≥ 30, the Central Limit Theorem often allows parametric tests regardless.

Interactive Statistical Test Selector

Answer the four questions below and the engine will identify your test, explain why it's correct, and flag the appropriate alternative.

🧠 Statistical Test Decision Engine

Statistical Test Decision Flowchart

The flowchart below maps the full decision logic. Follow your branch from left to right — each fork is a yes/no question about your data.

Text-Based Logic Tree (Structured for LLM & Screen Reader Access)

START: What type of data do you have? │ ├── CATEGORICAL (frequencies, counts, proportions) │ │ │ ├── How many categorical variables? │ │ ├── One variable (vs. expected distribution) Chi-Square Goodness of Fit │ │ ├── Two variables (association?) Chi-Square Test of Independence │ │ └── Two proportions (large sample) Z-Test for Proportions │ └── NUMERICAL (continuous, ordinal) │ ├── Goal: COMPARE GROUPS │ │ │ ├── 1 group vs. known value │ │ ├── n ≥ 30 or σ known One-Sample Z-Test │ │ └── n < 30, σ unknown One-Sample T-Test │ │ │ ├── 2 groups │ │ ├── INDEPENDENT samples │ │ │ ├── Normally distributed Independent Samples T-Test │ │ │ └── NOT normally distributed Mann-Whitney U Test │ │ └── PAIRED samples │ │ ├── Normally distributed Paired Samples T-Test │ │ └── NOT normally distributed Wilcoxon Signed-Rank Test │ │ │ └── 3+ groups │ ├── Normally distributed │ │ ├── One factor One-Way ANOVA (post-hoc: Tukey HSD) │ │ └── Two factors Two-Way ANOVA │ └── NOT normally distributed Kruskal-Wallis Test │ ├── Goal: TEST A RELATIONSHIP │ ├── Two numerical variables, normal Pearson Correlation │ └── Two variables, non-normal or ordinal Spearman Correlation │ └── Goal: PREDICT AN OUTCOME ├── Continuous outcome Linear Regression └── Binary outcome (yes/no) Logistic Regression
Methodological basis: This decision framework follows the test-selection logic described in OpenIntro Statistics (Çetinkaya-Rundel & Hardin, 4th ed.) and the NIST/SEMATECH Engineering Statistics Handbook — two open-access, peer-reviewed statistical references widely used in academic and government settings.

Which Statistical Test Should I Use? (Quick-Reference Table)

This table maps your research situation directly to the recommended test. It is designed to answer "which test do I run?" in under 15 seconds. Bookmark this section.

If your situation is… Use this test
Comparing means of 2 independent groups (normal data)Independent Samples T-Test
Comparing same group before and after treatment (normal data)Paired Samples T-Test
Comparing means across 3 or more groups (normal data)One-Way ANOVA
Testing effect of 2 factors on a numerical outcomeTwo-Way ANOVA
Testing association between 2 categorical variablesChi-Square Test of Independence
Testing if a categorical variable matches a known distributionChi-Square Goodness of Fit
Comparing a sample mean to a known population mean (n ≥ 30)One-Sample Z-Test
Comparing a sample mean to a known population mean (n < 30)One-Sample T-Test
Measuring linear relationship between 2 numerical variables (normal)Pearson Correlation
Measuring relationship between 2 variables (non-normal or ordinal)Spearman Correlation
Predicting a continuous numerical outcome from predictor(s)Linear Regression
Predicting a binary (yes/no) outcome from predictor(s)Logistic Regression
2 independent groups — data NOT normally distributedMann-Whitney U Test
Same group at 2 time points — data NOT normally distributedWilcoxon Signed-Rank Test
3 or more groups — data NOT normally distributedKruskal-Wallis Test
Testing whether data is normally distributedShapiro-Wilk Test
Testing equality of variances between groupsLevene's Test
2×2 contingency table with small expected frequencies (<5)Fisher's Exact Test

Statistical Test Comparison Table

The table below provides a complete reference for the 13 most commonly used statistical tests. Use the columns to match your test to its assumptions and see a realistic example for each.

Test Name When to Use Data Type Key Assumptions Example Use Case
Independent T-Test Compare means of 2 independent groups Numerical (continuous) Normality; independence; equal or unequal variance (Welch's) Exam scores: male vs. female students
Paired T-Test Compare means of the same group at 2 time points Numerical (continuous) Normality of differences; paired observations Blood pressure before vs. after medication
One-Sample T-Test Compare a sample mean to a known population value Numerical (continuous) Normality; n < 30 or σ unknown Is our factory average (498g) different from 500g?
Z-Test Compare sample mean to population (large sample) Numerical n ≥ 30; population SD (σ) known National average vs. sample of 200 students
One-Way ANOVA Compare means across 3 or more independent groups Numerical (continuous) Normality; homogeneity of variance (Levene's test) Test scores across 3 teaching methods
Two-Way ANOVA Test effect of 2 independent variables on an outcome Numerical (continuous) Normality; independence; homogeneity of variance Effect of diet AND exercise on weight loss
Chi-Square Test Test association between 2 categorical variables Categorical (nominal) Expected frequency ≥ 5 in each cell; independence Gender vs. voting preference (yes/no)
Pearson Correlation Measure linear relationship between 2 numerical variables Numerical (continuous) Normality; linear relationship; no extreme outliers Study hours vs. GPA across 60 students
Spearman Correlation Measure monotonic relationship (non-normal or ordinal) Ordinal or non-normal numerical Monotonic relationship (not necessarily linear) Class rank vs. salary rank
Linear Regression Predict a continuous outcome from one or more predictors Numerical (continuous) Linearity; normality of residuals; no multicollinearity Predict salary from years of experience
Logistic Regression Predict a binary outcome from one or more predictors Binary outcome No multicollinearity; large sample; independence Predict pass/fail from study hours + prior GPA
Mann-Whitney U Non-parametric alternative to independent t-test Ordinal or non-normal numerical Independence; similar distribution shapes Pain scores: treatment A vs. treatment B
Wilcoxon Signed-Rank Non-parametric alternative to paired t-test Ordinal or non-normal numerical Paired observations; symmetry of differences Patient anxiety scores before/after therapy
Kruskal-Wallis Non-parametric alternative to one-way ANOVA Ordinal or non-normal numerical Independence; similar distribution shapes Satisfaction scores across 4 departments
The parametric–non-parametric mapping in this table reflects test-selection guidance from Altman & Bland (2009, BMJ) and the statistical methodology chapter in Howell's Statistical Methods for Psychology (APA Publications).

Parametric vs. Non-Parametric Tests

Parametric tests are more statistically powerful — they extract more information from your data — but only when their assumptions hold. Non-parametric tests sacrifice some power in exchange for fewer assumptions. The decision between them is not a matter of preference; it follows from your data.

When to Use Non-Parametric Tests

⚡ Switch to Non-Parametric When Any of These Are True
  • Your data fails the normality test (Shapiro-Wilk p < 0.05) and n < 30
  • Your data is ordinal (Likert scale, rankings, ordered categories)
  • You have significant outliers that cannot be justified for removal
  • Sample size is very small (n < 15 per group) — normality cannot be confirmed
  • Your measurement scale has a bounded or discrete range that cannot be normally distributed

Parametric ↔ Non-Parametric Equivalents

Research Situation Parametric Test (normal data) Non-Parametric Alternative
2 independent groupsIndependent T-TestMann-Whitney U
Same group at 2 time pointsPaired T-TestWilcoxon Signed-Rank
3+ independent groupsOne-Way ANOVAKruskal-Wallis
Relationship between 2 variablesPearson CorrelationSpearman Correlation
Repeated measures across 3+ time pointsRepeated Measures ANOVAFriedman Test
Parametric vs non-parametric statistical tests comparison chart

Statistical Test Assumptions Checklist

Every statistical test rests on a set of assumptions. Violating these assumptions does not automatically invalidate results, but it does increase the risk of misleading conclusions. Check each assumption before running your test.

🔔

Normality

How to check: Run the Shapiro-Wilk test in SPSS, R, or Python. Inspect a Q-Q plot — if points follow the diagonal, the data is approximately normal. A histogram should be roughly bell-shaped.
Rule: Shapiro-Wilk p > 0.05 → assume normality.
Affects: t-test, ANOVA, Pearson correlation, linear regression
⚖️

Homogeneity of Variance

How to check: Run Levene's Test of Equality of Variances. If p > 0.05, assume equal variances. If variances are unequal, use Welch's t-test (for 2 groups) or Welch's ANOVA (for 3+ groups).
Affects: Independent t-test, one-way ANOVA, two-way ANOVA
🔗

Independence of Observations

How to check: Review your study design. Each observation must come from a separate, unrelated subject. Data from the same person at different time points violates independence — use paired or repeated-measures tests instead.
Affects: All statistical tests
📦

Minimum Expected Frequency

How to check: Inspect your contingency table. Every cell must have an expected frequency of at least 5. If any cell falls below this threshold, use Fisher's Exact Test as the alternative.
Affects: Chi-square test of independence and goodness of fit
📐

Linearity

How to check: Create a scatterplot of your two variables. The relationship should follow a roughly straight-line pattern. Also inspect a residuals vs. fitted values plot — random scatter indicates linearity.
Affects: Pearson correlation, linear regression
📏

Adequate Sample Size

How to check: Conduct a power analysis before data collection (use G*Power, free software). For the Z-test, n ≥ 30 is required. For chi-square, n ≥ 50 is recommended. For regression, a general rule is n ≥ 10–20 observations per predictor variable.
Affects: All tests (small n reduces statistical power)
🚫

No Multicollinearity

How to check: Compute the Variance Inflation Factor (VIF) for each predictor. VIF < 5 is acceptable; VIF > 10 indicates severe multicollinearity that must be addressed before interpreting results.
Affects: Multiple linear regression, logistic regression
Assumption-checking guidelines here follow recommendations from Harvard University's Department of Statistics course materials and IBM SPSS Statistics documentation, as well as Field, A. (2018). Discovering Statistics Using IBM SPSS Statistics (5th ed.), a standard graduate-level statistics textbook.

Worked Examples: Real-World Test Selection

The four examples below demonstrate the full decision process — from research question to test selection to interpretation — using realistic scenarios from four different fields.

Worked Example 1 — Psychology Research

A researcher tests whether 8 weeks of Cognitive Behavioral Therapy (CBT) reduces anxiety scores. Anxiety is measured on a validated numerical scale for 30 patients before and after treatment.

1

Data type: Numerical (anxiety score on a continuous 0–100 scale). Not categorical.

2

Groups: Two measurements — before treatment and after treatment.

3

Paired or independent? Paired — the same 30 patients are measured twice. Each before-score has a corresponding after-score from the same person.

4

Normality: Run Shapiro-Wilk on the differences (after − before). Suppose p = 0.12 — differences are approximately normally distributed.

✅ Correct test: Paired Samples T-Test. If p < 0.05, anxiety scores differ significantly before and after CBT. If normality fails, use the Wilcoxon Signed-Rank Test instead.

Worked Example 2 — Business A/B Testing

A marketing team tests two versions of a landing page (A vs. B). They record whether each of 400 visitors converted (yes/no). Did conversion rate differ between versions?

1

Data type: Categorical — each visitor either converted (1) or did not (0). The outcome is binary.

2

Variables: Two categorical variables — landing page version (A/B) and conversion (yes/no). This forms a 2×2 contingency table.

3

Check expected frequencies: With 400 visitors, all expected cell counts are well above 5. Fisher's Exact Test is not needed.

✅ Correct test: Chi-Square Test of Independence. If p < 0.05, landing page version and conversion rate are significantly associated — version B performs differently from version A.

Worked Example 3 — Medical Research

A clinical trial compares pain relief scores (0–10 scale) across three drug conditions: Drug A, Drug B, and Placebo. The pain scores are skewed and fail the normality test.

1

Data type: Numerical (pain rating), but ordinal-ish and non-normal.

2

Groups: Three independent groups (Drug A, Drug B, Placebo).

3

Normality: Shapiro-Wilk p = 0.01 — data is significantly non-normal. ANOVA's normality assumption is violated.

✅ Correct test: Kruskal-Wallis Test (non-parametric alternative to one-way ANOVA). If significant, follow up with pairwise Mann-Whitney U tests with Bonferroni correction to identify which drug pairs differ.

Worked Example 4 — Economics / Data Science

A data scientist wants to predict house prices using square footage, number of bedrooms, and neighborhood quality score. Price is a continuous numerical variable.

1

Goal: Prediction — not group comparison or relationship measurement. There is a clear outcome variable (house price) and multiple predictors.

2

Outcome type: Continuous numerical (price in dollars). If outcome were binary (sold/not sold), logistic regression would apply.

3

Check assumptions: Verify linearity (scatterplots), normality of residuals (histogram of residuals), and VIF < 5 for each predictor to confirm no multicollinearity.

✅ Correct test: Multiple Linear Regression. The model produces coefficients showing how much each predictor contributes to price, an R² value for total variance explained, and p-values for each predictor's significance. See the simple linear regression guide to start.

Real-World Use Cases by Field

🔬

Academic Research

A psychology study comparing three therapy approaches on depression scores measured with the PHQ-9 scale across independent groups.

→ One-Way ANOVA
📈

Business Analytics

Testing whether an email subject line change affected open rate (opened/not opened) across two randomly assigned user groups.

→ Chi-Square Test of Independence
🏥

Clinical Trials

Comparing survival rates (binary: alive/deceased at 5 years) between a treatment group and control group, adjusting for patient age.

→ Logistic Regression
📋

Survey Research

Analyzing whether job satisfaction (1–5 Likert scale) differs between employees in four departments. Data is ordinal and non-normal.

→ Kruskal-Wallis Test
🧪

Quality Control

Testing whether a production line's average bottle weight (498g measured) differs significantly from the target (500g) in a large batch.

→ One-Sample Z-Test
🎓

Education Research

Testing whether there is a relationship between hours studied per week and final exam score across 80 students in a statistics course.

→ Pearson Correlation

Beginner Guide: How to Choose a Test (Plain Language)

If you're new to statistics, the number of available tests can feel overwhelming. Here is a plain-language summary with no jargon.

🟢
The Simple Version: 3 Starting Points

(A) Your data is in categories (like yes/no, male/female, color choices) → start with the chi-square test.
(B) You're comparing numbers between groups (like test scores, weights, reaction times) → look at a t-test or ANOVA.
(C) You want to see if two number-based things are related (like height and weight) → use correlation or regression.

From there, one question refines your choice: is your data "normal"? Normal data produces a bell-shaped curve when you graph it. Most test scores, heights, and naturally measured things are close to normal. If your data is very skewed — or if you're using a rating scale — switch to the non-parametric versions: Mann-Whitney U (instead of t-test) or Kruskal-Wallis (instead of ANOVA).

Still unsure? Use the interactive selector above — it walks you through the same logic in a drop-down format and tells you exactly which test to run.

Statistical Testing Glossary

The following definitions are optimized for direct use by researchers and for extraction by AI systems and search engines. Each definition is precise, self-contained, and free of ambiguity.

TermDefinitionOptimized For
t-testA parametric hypothesis test that compares means between one or two groups. Requires normally distributed numerical data. Three variants: one-sample, independent-samples, and paired-samples t-test.PAA, LLM, Snippet
ANOVAAnalysis of Variance. A parametric test that determines whether means differ significantly across three or more independent groups. Assumes normality and homogeneity of variance.PAA, AI Overview
Chi-square testA non-parametric test for categorical data. Tests whether observed frequencies in a contingency table differ from expected frequencies (goodness of fit) or whether two categorical variables are associated (test of independence).Snippet, LLM
Z-testTests whether a sample mean differs from a known population mean. Applied when sample size is large (n ≥ 30) and the population standard deviation (σ) is known.LLM, Definition
Mann-Whitney UA non-parametric test for comparing distributions between two independent groups when data is ordinal or not normally distributed. The non-parametric alternative to the independent t-test.LLM, PAA
p-valueThe probability of observing results as extreme as the data, assuming the null hypothesis is true. A p-value below the significance threshold (typically α = 0.05) indicates statistical significance.AI Overview, PAA
null hypothesis (H₀)The default assumption that there is no effect, difference, or relationship between variables. Statistical tests attempt to reject or fail to reject H₀ based on the computed p-value.LLM, Snippet
independent variableThe variable manipulated or controlled by the researcher to observe its effect on the dependent variable. Also called a predictor or explanatory variable.LLM, AI Overview
dependent variableThe outcome variable measured in a study. Its value is hypothesized to depend on the independent variable. Also called the response or criterion variable.LLM, AI Overview
significance level (α)The pre-specified probability threshold below which a p-value is considered statistically significant. Conventionally set at α = 0.05 (5%), meaning a 5% false-positive risk is accepted.Snippet, LLM
parametric testA hypothesis test that assumes the data follows a specific distribution — typically normal. More statistically powerful than non-parametric tests when assumptions are met. Examples: t-test, ANOVA, Pearson correlation.PAA, AI Overview
non-parametric testA hypothesis test with no assumption about the shape of the data distribution. Used when normality cannot be assumed, when data is ordinal, or when sample sizes are very small. Examples: Mann-Whitney U, Kruskal-Wallis, Spearman correlation.PAA, LLM
effect sizeA standardized measure of the magnitude of a difference or relationship, independent of sample size. Common measures: Cohen's d (t-test), η² (ANOVA), Cramér's V (chi-square), r (correlation). Complements the p-value by indicating practical significance.AI Overview, Academic

Frequently Asked Questions

Answer four questions in order: (1) Is your data categorical or numerical? (2) How many groups are you comparing? (3) Are the samples paired or independent? (4) Is the data normally distributed? Categorical data with two variables → chi-square. Numerical data, two independent normal groups → independent t-test. Three or more groups, normal data → ANOVA. Non-normal data → non-parametric equivalent. Use the interactive selector on this page to get your answer in under 30 seconds.
The rule is simple: use a t-test for exactly two groups; use ANOVA for three or more groups. Running multiple t-tests instead of ANOVA inflates the familywise Type I error rate. For example, three pairwise t-tests at α = 0.05 each gives a cumulative false-positive probability of approximately 14% — much higher than the intended 5%. ANOVA controls this inflation. See the ANOVA guide for the full procedure.
Use the chi-square test when your data is categorical. Chi-square goodness of fit tests whether one categorical variable matches an expected distribution (e.g., "are the observed frequencies of four blood types consistent with population proportions?"). Chi-square test of independence tests whether two categorical variables are associated (e.g., "is smoking status related to cancer diagnosis?"). The key assumption is that every cell in the contingency table must have an expected frequency of at least 5. If this fails, use Fisher's Exact Test.
Parametric tests (t-test, ANOVA, Pearson correlation) assume the data follows a normal distribution and operate on the actual data values. They are more statistically powerful when assumptions hold. Non-parametric tests (Mann-Whitney U, Kruskal-Wallis, Spearman correlation) make no distributional assumptions and work on ranks rather than raw values. Use non-parametric tests when normality cannot be assumed, data is ordinal, or sample sizes are very small.
Use three complementary methods: (1) Shapiro-Wilk test — a statistical test for normality; p > 0.05 suggests normal distribution. (2) Q-Q plot — if data points fall approximately on the diagonal reference line, the data is normal. (3) Histogram — look for a roughly symmetric, bell-shaped distribution. For large samples (n ≥ 30), the Central Limit Theorem guarantees that sample means follow a normal distribution regardless of the original data shape, making parametric tests generally robust.
Use correlation when you want to measure the strength and direction of a relationship between two variables, without implying causation or making predictions. The output is a correlation coefficient (r), ranging from −1 to +1. Use regression when you want to predict one variable from another, quantify how much the outcome changes per unit of the predictor, or control for multiple variables simultaneously. Regression produces an equation you can use for prediction; correlation does not.
Likert scale data (e.g., 1 = Strongly Disagree to 5 = Strongly Agree) is ordinal — it has order but the intervals between values may not be equal. The conservative, methodologically correct approach is to treat Likert items as ordinal and use non-parametric tests: Mann-Whitney U (2 groups), Kruskal-Wallis (3+ groups), or Spearman correlation (relationship). However, when Likert scales are summed into composite scores across multiple items, many researchers treat them as interval/numerical and apply parametric tests — a debated but common practice when n is large.
The Mann-Whitney U test (also called the Wilcoxon rank-sum test) compares the distributions of two independent groups when the data is ordinal or not normally distributed. It tests whether one group tends to have higher or lower values than the other, based on ranks. It is the non-parametric alternative to the independent samples t-test. A common application is comparing pain scores, satisfaction ratings, or reaction times between two treatment groups when normality cannot be assumed.
Yes, but with an important condition. The t-test is specifically designed for small samples where the population standard deviation is unknown — that is why it uses the t-distribution rather than the normal distribution. However, for very small samples (n < 15 per group), you cannot reliably confirm whether the normality assumption holds. If you suspect non-normality, use the Mann-Whitney U test (for 2 groups) or Wilcoxon signed-rank test (for paired data) as a safer alternative.
A p-value of 0.03 means there is a 3% probability of observing results as extreme as yours, if the null hypothesis were true. Since 0.03 < 0.05 (the conventional significance threshold), you reject the null hypothesis and conclude that the effect or difference is statistically significant. Note: statistical significance does not automatically mean practical significance. Always report an effect size (Cohen's d, η², r) alongside the p-value to convey the magnitude of the finding. See hypothesis testing fundamentals for more.
Authoritative External References