What Is a Decision Rule? Definition and Core Purpose
The decision rule answers one specific question: given this test statistic (or this p-value), what do I conclude? It converts a continuous numerical result into a binary decision: reject or do not reject H₀. By committing to the rule before seeing data, a researcher eliminates the temptation to move goalposts — a form of bias called p-hacking or data dredging that inflates false positive rates.
The rule works in two equivalent formulations. The p-value approach operates in probability space: compare the computed p-value to the pre-set significance level α. The critical value approach operates in data space: compare the test statistic to a cutoff on the sampling distribution. Both frameworks test the same underlying null hypothesis and always reach the same decision — the choice between them is one of convenience, not logic.
The framework traces back to Ronald Fisher's significance testing in the 1920s and the Neyman–Pearson decision-theoretic extension of the 1930s. Today it underpins every form of inferential statistics — from clinical trials to A/B tests — taught and practised through resources like Statistics Fundamentals.
- P-value method: Reject H₀ if p ≤ α; fail to reject H₀ if p > α
- Critical value method: Reject H₀ if |test statistic| ≥ critical value
- α = 0.05 means you accept a 5% chance of a false rejection (Type I error)
- Two-tailed test at α = 0.05: critical values are z = ±1.96
- One-tailed test at α = 0.05: critical value is z = 1.645 (right) or −1.645 (left)
- Fail to reject ≠ accept: insufficient evidence against H₀ is not proof that H₀ is true
The Two Methods for Applying a Decision Rule
Every decision rule has two equivalent formulations. Understanding both is worth the time — different textbooks, software packages, and fields default to different presentations, and being fluent in both prevents confusion when switching contexts.
The P-Value Method (Probability Space)
The p-value is the probability of observing a test statistic at least as extreme as the one calculated, under the assumption that H₀ is true. It is a tail probability — a small p-value means the observed data is unlikely if the null hypothesis were correct.
p = computed p-value from the test
α = pre-set significance level (e.g., 0.05)
The logic is straightforward: if the probability of getting your data by random chance alone (assuming H₀) is smaller than your tolerance for false alarms (α), the data is "too surprising" to be consistent with H₀ and you reject it. The p-value itself carries no information about effect size or practical importance — a p of 0.001 in a study of ten million people may correspond to a trivially small difference.
The p-value is not the probability that H₀ is true. It is the probability of your observed data (or more extreme data) given that H₀ is true. These are fundamentally different quantities. See the p-values guide for a full treatment.
The Critical Value Method (Data Space)
The critical value is the threshold on the sampling distribution that corresponds to α. Test statistics beyond this threshold — in the tail(s) of the distribution — constitute the rejection region. Because critical values are derived from the same distribution as p-values, the two methods always agree on the final decision.
Zcalc = calculated test statistic
Zcrit = critical value from distribution table
|·| = absolute value (for two-tailed tests)
Critical values come from the sampling distribution relevant to the test — the standard normal distribution (z-tests), Student's t-distribution (t-tests), the chi-square distribution, or the F-distribution. They depend on α and, for t-tests, on degrees of freedom.
P-Value Method
- Works in probability space (0 to 1)
- Exact probability computed from test statistic
- Default output of most statistical software
- Allows comparison across different tests
- Requires no distribution table lookup
Critical Value Method
- Works in data space (z-scores, t-values)
- Geometric: maps directly to rejection region
- Useful for visualizing the decision boundary
- Standard in textbooks and hand calculation
- Requires table of critical values (e.g., z-table)
Rejection Regions: One-Tailed and Two-Tailed Tests
The rejection region is the set of test statistic values for which the decision rule produces "reject H₀." Its shape depends on whether the alternative hypothesis is directional (one-tailed) or non-directional (two-tailed). Getting this right before computing anything is part of setting the decision rule correctly.
Rejection Region Configurations at α = 0.05
The table below lists the rejection conditions for each tail configuration at common significance levels.
| Test Direction | Alternative Hypothesis | Reject H₀ if (z-test) | α = 0.05 boundary | α = 0.01 boundary |
|---|---|---|---|---|
| Two-tailed | μ ≠ μ₀ | |z| ≥ zα/2 | |z| ≥ 1.96 | |z| ≥ 2.576 |
| Right-tailed | μ > μ₀ | z ≥ zα | z ≥ 1.645 | z ≥ 2.326 |
| Left-tailed | μ < μ₀ | z ≤ −zα | z ≤ −1.645 | z ≤ −2.326 |
For t-tests, replace the z critical values with t* values from the t-distribution table, using degrees of freedom df = n − 1. The shape of the decision boundary changes with df — with very small samples, t* is considerably larger than the corresponding z value, reflecting greater uncertainty when the population standard deviation is unknown.
How to State the Decision Rule (Step-by-Step)
A well-stated decision rule comes before any calculation. Here is the procedure for writing one, in order.
State H₀ and H₁ First
The decision rule depends on H₁ — specifically, whether it is directional. H₀: μ = μ₀ vs. H₁: μ ≠ μ₀ requires a two-tailed rule; H₀: μ = μ₀ vs. H₁: μ > μ₀ requires a right-tailed rule. Write out both hypotheses before doing anything else. See the null and alternative hypothesis guide.
Choose the Significance Level α
α sets the false positive rate you can tolerate. Common choices: α = 0.05 (most fields), α = 0.01 (medical and safety research), α = 0.10 (exploratory work). Choosing a smaller α tightens the rejection region and lowers Type I error at the cost of increased Type II error (missing a real effect). This connection is covered in significance level and Type I and Type II errors.
Identify the Appropriate Test Statistic
The right test depends on what is known and how the data is structured. Use a z-statistic when population σ is known; use a t-statistic when σ is estimated from the sample. The test statistic formula determines which sampling distribution the critical value comes from — and therefore where the rejection region boundary falls. See the statistical test selector.
State the Decision Rule Explicitly
Write it out in full before collecting data. For the p-value method: "Reject H₀ if p ≤ 0.05." For the critical value method: "Reject H₀ if |z| ≥ 1.96." Both are complete, unambiguous decision rules. A rule stated after seeing the data is not a legitimate decision rule — it is rationalization.
Apply the Rule and State the Conclusion
Compute the test statistic and either the p-value or compare to the critical value. Apply the rule mechanically. Then translate the binary output into a plain-language conclusion: "At the 5% significance level, there is sufficient evidence to conclude that μ differs from μ₀" — or the reverse. Never write "we prove H₁" or "we accept H₀."
How α Shapes the Decision Rule
The significance level α is the primary lever controlling the decision rule's sensitivity. Lowering α moves the critical value further into the tail, making the rejection region smaller and harder to enter. This reduces false positives (Type I errors) but increases false negatives (Type II errors, β).
| Significance Level (α) | Type I Error Rate | Two-Tailed zcrit | Right-Tailed zcrit | Practical Use |
|---|---|---|---|---|
| 0.10 | 10% | ±1.645 | 1.282 | Exploratory research |
| 0.05 | 5% | ±1.960 | 1.645 | Default in most fields |
| 0.01 | 1% | ±2.576 | 2.326 | Clinical trials, safety |
| 0.001 | 0.1% | ±3.291 | 3.090 | Physics (5σ convention) |
The choice of α carries consequences that extend beyond the individual test. In large-scale multiple testing scenarios — genomics studies examining thousands of genetic markers, for example — the expected number of false positives at α = 0.05 across 10,000 tests is 500. Corrections such as the Bonferroni adjustment or the false discovery rate (FDR) tighten the per-test decision rule to control the family-wise error rate.
Statistical power (1 − β) is the probability of correctly rejecting a false H₀. Reducing α lowers power. To maintain both low Type I and low Type II error simultaneously, a researcher must increase the sample size. The Cohen's d and effect size guide explains how to calculate required sample size for a target power.
Worked Examples: Applying the Decision Rule
Each example below states the decision rule explicitly before computing anything, then applies it mechanically. Critical values use the standard normal and t-distributions as documented by the NIST Engineering Statistics Handbook.
Example 1 — One-Sample Z-Test (Corporate Operations)
Problem: A logistics company claims mean parcel delivery time is 48 hours. An operations analyst samples 64 deliveries and records x̄ = 50.5 hours. Known population SD σ = 8 hours. At α = 0.05, does the data contradict the company's claim?
x̄ = 50.5 hrs
μ₀ = 48 hrs
σ = 8 hrs
n = 64
Hypotheses: H₀: μ = 48 hours | H₁: μ ≠ 48 hours (two-tailed — testing for any departure from the claimed value)
Significance level: α = 0.05 (two-tailed)
Decision rule (stated before calculation):
P-value method: Reject H₀ if p ≤ 0.05
Critical value method: Reject H₀ if |z| ≥ 1.96
Test statistic:
SE = σ/√n = 8/√64 = 8/8 = 1.00
z = (50.5 − 48) / 1.00 = 2.50
Apply the decision rule:
Critical value method: |z| = 2.50 > 1.96 → test statistic is in the rejection region
P-value method: p = 2 × P(Z > 2.50) = 2 × 0.0062 = 0.0124 < 0.05
✅ Decision: Reject H₀. At α = 0.05, there is sufficient evidence to conclude that mean delivery time differs from 48 hours. Both methods agree: the test statistic (z = 2.50) exceeds the critical value (1.96), and p = 0.0124 < 0.05.
Example 2 — One-Sample T-Test (Unknown Population SD)
Problem: A manufacturer claims its batteries last 300 hours. A quality engineer tests a sample of 16 batteries and finds x̄ = 291 hours with s = 20 hours. At α = 0.05, is there evidence that batteries are lasting less than claimed?
x̄ = 291 hrs
μ₀ = 300 hrs
s = 20 hrs
n = 16, df = 15
Hypotheses: H₀: μ = 300 hours | H₁: μ < 300 hours (left-tailed — testing specifically for shorter life)
Significance level: α = 0.05 (left-tailed)
Decision rule (stated before calculation):
P-value method: Reject H₀ if p ≤ 0.05
Critical value method: Reject H₀ if t ≤ −t*(df=15, α=0.05) = −1.753 (from t-distribution table)
Test statistic:
SE = s/√n = 20/√16 = 20/4 = 5.00
t = (291 − 300) / 5.00 = −9/5 = −1.80
Apply the decision rule:
Critical value method: t = −1.80 < −1.753 → test statistic is in the left rejection region
P-value: p ≈ 0.046 < 0.05
✅ Decision: Reject H₀. At α = 0.05, there is sufficient evidence that the batteries' mean life is less than 300 hours. The test statistic (t = −1.80) falls just past the critical boundary (−1.753), and p ≈ 0.046. See the full one-sample t-test guide for more examples.
Example 3 — Two-Sample T-Test (A/B Testing)
Problem: A product team runs an A/B test on a checkout flow. Version A (n₁ = 100) averages $52 per order (s₁ = $12); Version B (n₂ = 100) averages $56 per order (s₂ = $14). At α = 0.05, did Version B generate significantly higher revenue per order?
x̄₁ = 52, x̄₂ = 56
s₁ = 12, s₂ = 14
n₁ = n₂ = 100
Hypotheses: H₀: μ_A = μ_B (no difference) | H₁: μ_B > μ_A (right-tailed — testing that B exceeds A)
Decision rule (before computing):
P-value method: Reject H₀ if p ≤ 0.05
Critical value method: Reject H₀ if t ≥ 1.645 (large-sample approximation; use the two-sample t-test guide for exact df via Welch–Satterthwaite)
Test statistic:
SE = √(144/100 + 196/100) = √(1.44 + 1.96) = √3.40 ≈ 1.844
t = (52 − 56) / 1.844 = −4 / 1.844 ≈ −2.17
Apply the decision rule:
t = −2.17. The test is right-tailed (H₁: μ_B > μ_A). Since we framed H₁ as μ_B − μ_A > 0 but computed x̄_A − x̄_B, t is negative. The correct framing: t = (56 − 52) / 1.844 = +2.17 > 1.645.
P-value ≈ 0.015 < 0.05.
✅ Decision: Reject H₀. Version B produced statistically significantly higher average revenue per order at α = 0.05 (t = 2.17 > 1.645, p ≈ 0.015). Sign convention matters: always frame the test statistic to match H₁'s direction.
Fail to Reject Is Not Accept: A Critical Distinction
The most common error in reporting statistical conclusions is writing "we accept H₀" when the decision rule does not lead to rejection. This is wrong, and the distinction matters practically.
| Situation | Incorrect Phrasing | Correct Phrasing |
|---|---|---|
| p = 0.23, α = 0.05 | We accept H₀: the mean equals 50. | We fail to reject H₀. There is insufficient evidence at α = 0.05 to conclude that the mean differs from 50. |
| p = 0.08, α = 0.05 | The drug has no effect (H₀ is true). | The data did not produce a statistically significant result at α = 0.05. This does not rule out a real effect. |
| p = 0.049, α = 0.05 | We barely proved H₁ is true. | We reject H₀ at α = 0.05. The result is statistically significant; H₁ is supported by the data, not proven. |
Failing to reject H₀ means your sample did not provide enough evidence against the null claim at the chosen significance level. Four things can produce this outcome: the null hypothesis is genuinely correct; the sample size was too small to detect the effect; the true effect size is smaller than the study was powered to detect; or the test was misspecified. A non-significant result warrants investigation, not the conclusion that "nothing is there."
A single test cannot prove a null hypothesis. Absence of evidence (p > α) is not evidence of absence. The correct framing is always: "There is insufficient statistical evidence to reject H₀ at the α = 0.05 significance level." For more on what p-values do and do not say, see the p-values explainer.
Decision Rule Reference Tables
Primary Decision Rule Framework
| Method | Condition for Rejection (Reject H₀) | Condition for Retention (Fail to Reject H₀) |
|---|---|---|
| P-Value Approach | p ≤ α | p > α |
| Critical Value Approach | Test statistic falls in rejection region | Test statistic falls in retention region |
Tail Configurations and Boundary Conditions (z-test)
| Test Type | Alternative Hypothesis (H₁) | Rejection Region Condition | α = 0.05 boundary | α = 0.01 boundary |
|---|---|---|---|---|
| Left-Tailed | μ < μ₀ | z ≤ −Zα | z ≤ −1.645 | z ≤ −2.326 |
| Right-Tailed | μ > μ₀ | z ≥ Zα | z ≥ 1.645 | z ≥ 2.326 |
| Two-Tailed | μ ≠ μ₀ | |z| ≥ Zα/2 | |z| ≥ 1.960 | |z| ≥ 2.576 |
Common Critical Values for the T-Distribution
| Degrees of Freedom (df) | α = 0.10 (two-tailed) | α = 0.05 (two-tailed) | α = 0.01 (two-tailed) | α = 0.05 (one-tailed) |
|---|---|---|---|---|
| 5 | 2.015 | 2.571 | 4.032 | 2.015 |
| 10 | 1.812 | 2.228 | 3.169 | 1.812 |
| 15 | 1.753 | 2.131 | 2.947 | 1.753 |
| 20 | 1.725 | 2.086 | 2.845 | 1.725 |
| 30 | 1.697 | 2.042 | 2.750 | 1.697 |
| 60 | 1.671 | 2.000 | 2.660 | 1.671 |
| ∞ (z) | 1.645 | 1.960 | 2.576 | 1.645 |
Full tables: t-distribution table | z-table | chi-square table
Key Entities and Formulas
| Entity | Notation | Definition / Formula |
|---|---|---|
| Null Hypothesis | H₀ | The default claim being tested; assumes no effect or no difference (e.g., μ = μ₀) |
| Alternative Hypothesis | H₁ (or Hₐ) | The research claim; states an effect exists (e.g., μ ≠ μ₀, μ > μ₀, or μ < μ₀) |
| Significance Level | α | Pre-set Type I error rate; probability of rejecting H₀ when it is true |
| P-Value | p | P(data as extreme or more extreme | H₀ true); small values indicate evidence against H₀ |
| Z Statistic | z = (x̄ − μ₀)/(σ/√n) | Standardized distance from sample mean to hypothesized mean; used when σ is known |
| T Statistic | t = (x̄ − μ₀)/(s/√n) | Standardized distance using sample SD; follows t-distribution with df = n − 1 |
| Critical Value | Zcrit or t* | The threshold test statistic value at which the decision switches from "fail to reject" to "reject" |
| Rejection Region | RR | The set of test statistic values for which H₀ is rejected; located in the tail(s) of the distribution |
| Type I Error | α (false positive) | Rejecting H₀ when it is actually true; its probability is exactly α by construction |
| Type II Error | β (false negative) | Failing to reject H₀ when it is false; its complement (1 − β) is statistical power |
Frequently Asked Questions
Q: What is the decision rule in a hypothesis test?
A decision rule is a predefined, explicit criterion for choosing between two outcomes — reject H₀ or fail to reject H₀ — based on sample data. It is stated before data collection and takes one of two forms: "reject if p ≤ α" (p-value method) or "reject if the test statistic falls in the rejection region" (critical value method). Both are mathematically equivalent.
Q: How do you make a decision in hypothesis testing?
Q: Can I state the decision rule after seeing the data?
No. A decision rule stated after observing the data is not a legitimate decision rule — it is rationalization. The entire purpose of stating the rule in advance is to control the Type I error rate. When α is chosen after seeing whether p < 0.05 or p < 0.01 produced a nicer conclusion, the actual false positive rate is no longer controlled at the claimed level. This practice is known as p-hacking. The American Statistical Association's 2016 statement on p-values explicitly addresses this problem, available at The American Statistician.
Q: What is the difference between the p-value method and the critical value method?
They operate in different spaces but always reach the same conclusion. The p-value method computes the exact tail probability and compares it to α — it works in probability space (0 to 1). The critical value method maps the test statistic to a boundary on the sampling distribution and checks whether the statistic crosses that boundary — it works in data space (z-scores, t-values). Statistical software reports both by default. The p-value method is more common in practice because it gives an exact measure of evidence; the critical value method is more common in textbook hand-calculation exercises because it connects visually to the rejection region diagram.
Q: How does the decision rule apply in A/B testing?
An A/B test is a two-sample hypothesis test. The decision rule is typically: "reject H₀ (no difference between variants) if p ≤ 0.05." In practice, product teams pre-register the required sample size using power analysis, run the test until that size is reached, then apply the decision rule mechanically. Early stopping — checking the result before the planned end and stopping if p < 0.05 — inflates the false positive rate and violates the spirit of the decision rule. See the hypothesis testing examples page for a fully worked A/B testing example.
Interactive Decision Rule Simulator
Enter your test parameters below. The simulator applies the decision rule to your numbers, shows the test statistic, p-value, and critical value, and delivers the correct statistical conclusion. For a z-test, supply the known population standard deviation; for a t-test, supply the sample standard deviation.
Decision Rule Calculator — Z-Test & T-Test
The Decision Rule in Practice
The decision rule shows up under different names and in different software outputs across every field that uses inferential statistics, but the underlying logic is always the same.
Clinical Research
Trial protocols registered with ClinicalTrials.gov must pre-specify α and the primary endpoint decision rule. Regulators treat post-hoc α adjustment as a form of protocol deviation.
A/B Testing
Growth teams at technology companies use a two-sample decision rule to determine whether a product variant is a significant improvement. The rule gates shipping decisions. Bayesian alternatives also exist.
Quality Control
Control charts in manufacturing implement a continuous decision rule: flag a process as out-of-control when the test statistic (sample mean or range) breaches a ±3σ control limit — effectively α ≈ 0.0027.
Physics
Particle physics uses a 5-sigma (5σ) standard for discovery claims — a two-tailed z-test decision rule with p < 2.87 × 10⁻⁷. The 2012 Higgs boson announcement met this threshold.
Finance & Economics
Regression coefficients in econometric models are tested with a t-test decision rule. A coefficient is reported as "statistically significant" when |t| ≥ t*(df, α) — typically 2.0 for large samples at α = 0.05.
Genomics
Genome-wide association studies test hundreds of thousands of SNPs simultaneously. The per-test decision rule uses α = 5 × 10⁻⁸ (Bonferroni-adjusted) rather than 0.05, to control the family-wise error rate.
Related Topics
The decision rule sits at the centre of hypothesis testing. The concepts below build directly on or connect directly to understanding it correctly.
Hypothesis Testing — full topic hub | P-Values | Significance Level | Null & Alternative Hypothesis | Type I & Type II Errors | Cohen's d & Effect Size | Degrees of Freedom | Hypothesis Testing Examples | Z-Score | One-Sample T-Test | Two-Sample T-Test | Confidence Intervals