Null and Alternative Hypothesis: Core Definitions
Every statistical test in science, medicine, business analytics, and social research uses this same structure. The null and alternative hypothesis together form a mutually exclusive and exhaustive pair — exactly one of them describes the true state of the population, and the data helps you determine which one the evidence favors.
- Null hypothesis (H₀): No effect, no difference. Always contains an equality sign (=, ≤, ≥)
- Alternative hypothesis (H₁): Effect exists. Contains ≠, >, or <
- You test H₀: The goal is to find evidence strong enough to reject it
- Rejection rule: Reject H₀ when p-value ≤ α (significance level)
- "Fail to reject": Not the same as proving H₀ true — the data was simply inconclusive
- Common α values: 0.05 (most fields), 0.01 (medicine), 0.10 (exploratory research)
The Court Trial Analogy: How to Think About Hypothesis Testing
The easiest way to understand the null vs alternative hypothesis structure is through a court trial. The logic is identical.
| Court Trial | Hypothesis Testing | What It Means |
|---|---|---|
| Defendant is innocent until proven guilty | H₀ is assumed true until evidence says otherwise | Default assumption requires no evidence to accept |
| Prosecution presents evidence | Researcher collects data and computes test statistic | Evidence evaluated against the default assumption |
| "Beyond reasonable doubt" | p-value ≤ α (significance level) | Evidence must cross a defined threshold |
| Verdict: Guilty | Reject H₀, support H₁ | Evidence is strong enough to overturn the default |
| Verdict: Not Guilty | Fail to reject H₀ | Insufficient evidence — not proof of innocence |
Notice that "not guilty" is not the same as "innocent." The jury does not conclude the defendant is innocent; they conclude the evidence did not meet the required standard. Hypothesis testing works exactly the same way. Failing to reject H₀ never means you have proven H₀ is true. It means your data, at the chosen significance level, was not strong enough to reject it.
Many students write "we accept H₀" when a test is not significant. This is statistically incorrect. The correct language is "we fail to reject H₀." You can only reject or fail to reject a null hypothesis — you never accept or prove it. This distinction matters in published research and on statistics exams.
How to Write Null and Alternative Hypotheses
Good hypotheses share four properties: they are clear, testable, mutually exclusive, and stated in terms of population parameters (not sample statistics). The parameter is the true value in the whole population; the statistic is what you calculate from your sample.
Null always uses equality
The null hypothesis always contains =, ≤, or ≥. It represents the status quo. Write the value you are comparing against as μ₀ or p₀.
Alternative defines the test direction
Use ≠ for a two-tailed test (either direction). Use > or < for a directional one-tailed test when theory predicts a specific direction.
Test population parameters
Hypotheses are about population parameters (μ, p), not sample statistics (x̄, p̂). You collect sample data to make inferences about the population.
H₀ and H₁ cover all cases
Together, H₀ and H₁ must account for every possible value of the parameter. No value should fall into neither hypothesis.
Writing Hypotheses: Six Real Scenarios
| Research Scenario | H₀ | H₁ | Test Type |
|---|---|---|---|
| Does a new drug lower mean blood pressure below 120 mmHg? | H₀: μ ≥ 120 | H₁: μ < 120 | One-tailed (left) |
| Has a website's conversion rate changed from the historical 3.5%? | H₀: p = 0.035 | H₁: p ≠ 0.035 | Two-tailed |
| Does a training program improve test scores above the national mean of 75? | H₀: μ ≤ 75 | H₁: μ > 75 | One-tailed (right) |
| Do two manufacturing machines produce parts with equal mean diameters? | H₀: μ₁ = μ₂ | H₁: μ₁ ≠ μ₂ | Two-tailed |
| Is the defect rate in a production line above the acceptable 2%? | H₀: p ≤ 0.02 | H₁: p > 0.02 | One-tailed (right) |
| Is mean customer satisfaction different between two service designs? | H₀: μ₁ − μ₂ = 0 | H₁: μ₁ − μ₂ ≠ 0 | Two-tailed |
The 6-Step Hypothesis Testing Framework
Every hypothesis test follows the same logical sequence. Learn this order once and you can apply it to any statistical test — z-test, t-test, chi-square, ANOVA, or regression coefficient test.
The Universal 6-Step Process
State the hypotheses (H₀ and H₁). Write both hypotheses using formal parameter notation. Decide at this step whether the test is one-tailed or two-tailed based on the research question — never adjust the direction of the test after seeing the data.
Set the significance level (α). Choose α before collecting data. Common choices are 0.05 (general research), 0.01 (clinical/medical studies requiring stricter evidence), and 0.10 (exploratory pilot studies). α defines the risk of a Type I error you are willing to accept.
Choose the test and verify assumptions. Select the appropriate test based on data type, sample size, and whether population parameters are known. A statistical test selector can help. Check normality, independence, and homogeneity of variance as needed.
Collect data and calculate the test statistic. Compute the standardized test statistic (z-score or t-score) from your sample data. This single number summarizes how far your sample result is from what H₀ predicts, measured in standard errors.
Find the p-value and compare to α. The p-value is the probability of observing a test statistic as extreme as yours (or more extreme) if H₀ were true. If p ≤ α, the evidence is strong enough to reject H₀. If p > α, fail to reject H₀.
State the conclusion in context. Write a plain-language conclusion that references the original research question. Never just write "reject H₀" — explain what that means for the specific problem. For example: "There is sufficient evidence at α = 0.05 to conclude that the new drug lowers mean blood pressure below 120 mmHg."
Test Statistic Formulas
The test statistic converts your sample result into a standardized number that can be looked up in a distribution table or converted to a p-value. The two most common tests for means are the z-test (when population standard deviation σ is known) and the t-test (when σ is unknown and estimated from the sample).
x̄ = sample mean
μ₀ = hypothesized population mean (from H₀)
σ = population standard deviation
n = sample size
x̄ = sample mean
μ₀ = hypothesized population mean
s = sample standard deviation
n = sample size
p̂ = sample proportion
p₀ = hypothesized population proportion (from H₀)
n = sample size
Use a z-test when the population standard deviation (σ) is known, or when your sample size is 30 or larger (the Central Limit Theorem makes the sampling distribution approximately normal regardless). Use a t-test when σ is unknown and your sample is small (n < 30). In practice, σ is almost never known for real-world problems, so the t-test is more commonly used.
Step-by-Step Worked Examples
Example 1: Z-Test (Two-Tailed)
A quality control engineer samples 50 bolts from a production line. The historical mean bolt diameter is 10 mm with a known population standard deviation of 0.5 mm. The sample mean is 10.14 mm. At α = 0.05, is there evidence that the mean diameter has changed?
State hypotheses: H₀: μ = 10 mm | H₁: μ ≠ 10 mm. This is a two-tailed test — the engineer wants to detect any change, not just an increase or decrease.
Set significance level: α = 0.05 (given). For a two-tailed test, the critical region is split between both tails: α/2 = 0.025 per tail. Critical z-values are ±1.96.
Calculate test statistic: z = (x̄ − μ₀) / (σ / √n) = (10.14 − 10) / (0.5 / √50) = 0.14 / 0.0707 = 1.98
Find p-value: z = 1.98 corresponds to a one-tail area of 0.0239. For a two-tailed test: p-value = 2 × 0.0239 = 0.0478.
Decision: p-value (0.0478) ≤ α (0.05) → Reject H₀. The test statistic (1.98) also falls in the rejection region (|z| > 1.96).
✓ Conclusion: At α = 0.05, there is sufficient statistical evidence to conclude that the mean bolt diameter has changed from 10 mm. The production line may need recalibration.
Example 2: One-Sample T-Test (One-Tailed)
A school claims its students score above the national average of 72 on a standardized test. A sample of 16 students has a mean score of 76 with a sample standard deviation of 8. At α = 0.05, does the evidence support the school's claim?
State hypotheses: H₀: μ ≤ 72 | H₁: μ > 72. This is a right-tailed test — the school claims scores are above the national average, a directional claim.
Set significance level: α = 0.05. Degrees of freedom: df = n − 1 = 15. For a right-tailed t-test at α = 0.05 with df = 15, the critical value is t* = 1.753.
Calculate test statistic: t = (x̄ − μ₀) / (s / √n) = (76 − 72) / (8 / √16) = 4 / 2 = 2.00
Find p-value: Using the t-distribution with df = 15, t = 2.00 gives a one-tail p-value ≈ 0.032.
Decision: p-value (0.032) ≤ α (0.05) → Reject H₀. The test statistic (2.00) also exceeds the critical value (1.753).
✓ Conclusion: At α = 0.05, there is sufficient evidence to support the school's claim that students score above the national average of 72. The sample mean of 76 is statistically significantly higher.
Example 3: Proportion Z-Test
A tech company claims its product has a 95% customer satisfaction rate. In a random survey of 200 customers, 184 report being satisfied (p̂ = 0.92). At α = 0.05, is there evidence the true satisfaction rate is below 95%?
State hypotheses: H₀: p ≥ 0.95 | H₁: p < 0.95. Left-tailed test — the question is whether satisfaction has fallen below the claimed rate.
Verify conditions: np₀ = 200 × 0.95 = 190 ≥ 10. n(1 − p₀) = 200 × 0.05 = 10 ≥ 10. ✓ Conditions met. Critical value at α = 0.05 (left-tail): z* = −1.645.
Calculate test statistic: z = (0.92 − 0.95) / √[0.95 × 0.05 / 200] = −0.03 / √(0.0002375) = −0.03 / 0.01541 ≈ −1.95
Find p-value: z = −1.95 gives a left-tail area ≈ 0.026.
Decision: p-value (0.026) ≤ α (0.05) → Reject H₀.
✓ Conclusion: At α = 0.05, there is sufficient evidence to conclude the true satisfaction rate is below the claimed 95%. The company should investigate the drop in customer satisfaction.
What the P-Value Actually Means
The p-value is one of the most misunderstood concepts in statistics. Here is the precise definition:
1. The p-value is NOT the probability that H₀ is true. 2. The p-value is NOT the probability that you made an error. 3. A small p-value does NOT mean the effect is large or practically important — a trivially small difference can produce a tiny p-value with a large sample. Always report effect size alongside p-values.
One-Tailed vs Two-Tailed Tests
The tail direction is determined entirely by the alternative hypothesis, and it must be chosen before data collection. Changing from a two-tailed to a one-tailed test after seeing your results to achieve significance is called p-hacking and invalidates the test.
| Feature | Two-Tailed Test | One-Tailed Test |
|---|---|---|
| Alternative hypothesis | H₁: μ ≠ μ₀ | H₁: μ > μ₀ or H₁: μ < μ₀ |
| Where rejection region lies | Split between both tails (α/2 each) | Entirely in one tail (all of α) |
| Critical z at α = 0.05 | ±1.96 | +1.645 (right) or −1.645 (left) |
| When to use | When you want to detect change in either direction | When theory predicts a specific direction before data collection |
| More conservative? | Yes — harder to reject H₀ | No — easier to reject in the predicted direction |
| Common example | "Has the mean changed from 50?" | "Is the mean greater than 50?" |
Rejection Regions: Two-Tailed vs One-Tailed (α = 0.05)
Red shading = rejection region for two-tailed test. Green shading = rejection region for right-tailed test. Same α = 0.05, different critical values.
Type I and Type II Errors
Every hypothesis test carries two types of possible mistakes. Understanding them prevents misinterpreting results and helps researchers design studies with enough statistical power to detect real effects.
| Decision \ Reality | H₀ is Actually TRUE | H₀ is Actually FALSE |
|---|---|---|
| Reject H₀ | ❌ Type I Error (False Positive) Probability = α |
✓ Correct Decision Probability = 1 − β (Power) |
| Fail to Reject H₀ | ✓ Correct Decision Probability = 1 − α |
⚠️ Type II Error (False Negative) Probability = β |
False Positive
Rejecting H₀ when it is actually true. Controlled directly by your choice of α. A drug trial commits a Type I error when it concludes a useless drug is effective. Lowering α reduces Type I error risk but increases Type II error risk.
False Negative
Failing to reject H₀ when it is actually false. Reduced by increasing sample size, increasing effect size, or raising α. A drug trial commits a Type II error when it misses a genuinely effective drug.
Probability of Correct Rejection
The probability of correctly rejecting a false H₀. Researchers typically aim for power ≥ 0.80, meaning an 80% chance of detecting a true effect. Power increases with larger sample size and larger effect size.
Type I and Type II Errors in Medical Research
The distinction between Type I and Type II errors was formalized by Jerzy Neyman and Egon Pearson in their foundational 1933 paper on hypothesis testing. In clinical trial design, a Type I error rate of α = 0.05 combined with a target power of 1 − β = 0.80 is the standard that determines minimum required sample sizes. The U.S. Food and Drug Administration's guidance on clinical trials requires explicit pre-specification of α and power for drug approval studies. See Banerjee et al. (2009) in the Indian Journal of Dermatology for a clear clinical overview.
Real-World Case Studies
Case Study 1: A/B Testing in Digital Marketing
Real-World Application
Conversion Rate Optimization
An e-commerce company runs two versions of a product page simultaneously. Version A (control) shows a historical conversion rate of 4.2%. Version B (new design) converts 4.8% of visitors out of a sample of 2,500 per group. The team wants to know if the new design genuinely outperforms the control or if the difference is within normal random variation.
Hypotheses: H₀: p_B = p_A (no difference) | H₁: p_B > p_A (Version B converts better)
Result: z = 2.21, p-value = 0.014. At α = 0.05, reject H₀. The difference is statistically significant — the new design produces a real improvement in conversion rate. The company rolls out Version B globally, resulting in an estimated $340,000 increase in annual revenue from a 0.6 percentage-point lift.
Case Study 2: Clinical Drug Trial
Real-World Application
Blood Pressure Medication Efficacy
A clinical trial tests whether a new antihypertensive drug reduces mean systolic blood pressure below the placebo group's mean of 145 mmHg. After 12 weeks, 120 patients on the drug show a mean of 138 mmHg with s = 15 mmHg.
Hypotheses: H₀: μ ≥ 145 mmHg | H₁: μ < 145 mmHg (one-tailed, left)
Result: t = (138 − 145) / (15 / √120) = −7 / 1.369 ≈ −5.11, p < 0.0001. Extremely strong evidence to reject H₀. The drug reduces blood pressure. The study proceeds to Phase III trials — but the research team also reports effect size (Cohen's d ≈ 0.47, a medium effect) to show clinical meaningfulness, not just statistical significance.
Case Study 3: Manufacturing Quality Control
Real-World Application
Production Line Defect Rate
A manufacturer's specification requires a defect rate no higher than 2%. Quality control inspects 500 units and finds 15 defective (p̂ = 0.03, or 3%). Is this evidence the process is out of control?
Hypotheses: H₀: p ≤ 0.02 | H₁: p > 0.02 (one-tailed, right)
Result: z = (0.03 − 0.02) / √(0.02 × 0.98 / 500) = 0.01 / 0.00626 ≈ 1.60. p-value ≈ 0.055. At α = 0.05, fail to reject H₀. At α = 0.10, reject H₀. This borderline result prompts the quality team to increase sampling to 1,000 units — a correct application of the "insufficient power" reasoning for inconclusive results.
Interactive Hypothesis Testing Calculator
Enter your sample data below to compute the test statistic, p-value, and a plain-language decision. The calculator handles one-sample z-tests and one-sample t-tests for means.
Hypothesis Testing Calculator
Null vs Alternative Hypothesis: Full Comparison
| Concept | Null Hypothesis (H₀) | Alternative Hypothesis (H₁) |
|---|---|---|
| Symbol | H₀ | H₁ or Hₐ |
| What it claims | No effect, no difference, no relationship | An effect, difference, or relationship exists |
| Mathematical sign | Always includes = (equals, ≤, or ≥) | Always uses ≠, >, or < |
| Role in the test | Default assumption being tested (skeptical position) | Research claim trying to gain support |
| What data does | Either rejects or fails to reject H₀ | Is supported when H₀ is rejected |
| Can you prove it? | No — only reject or fail to reject | Supported (not proven) when H₀ is rejected |
| Error if wrong decision | Falsely rejecting = Type I error (α) | Failing to support when true = Type II error (β) |
| Example (mean) | H₀: μ = 100 | H₁: μ ≠ 100 (two-tailed) |
| Example (proportion) | H₀: p = 0.50 | H₁: p > 0.50 (right-tailed) |
Key Terms and Formulas Glossary
| Term | Formula / Notation | Definition |
|---|---|---|
| Null Hypothesis | H₀: μ = μ₀ | Default assumption of no effect or difference. Contains an equality sign. Never proven, only rejected or retained. |
| Alternative Hypothesis | H₁: μ ≠ μ₀ | The research claim being tested. Contains an inequality. Supported when H₀ is rejected. |
| Significance Level | α (commonly 0.05) | The threshold probability for rejecting H₀. Represents the maximum acceptable Type I error rate. |
| P-Value | P(data | H₀ true) | Probability of observing results as extreme as the sample data if H₀ were true. Reject H₀ when p ≤ α. |
| Z-Test Statistic | z = (x̄ − μ₀) / (σ/√n) | Standardized test statistic used when population σ is known or n ≥ 30. Compared to z-distribution critical values. |
| T-Test Statistic | t = (x̄ − μ₀) / (s/√n) | Test statistic used when population σ is unknown. Uses sample SD s. Degrees of freedom = n − 1. |
| Critical Value | z* or t* | The boundary between the rejection region and non-rejection region. Reject H₀ if |test statistic| > critical value. |
| Type I Error | P = α | Rejecting a true H₀ (false positive). Probability equals α. In medicine: concluding a useless drug works. |
| Type II Error | P = β | Failing to reject a false H₀ (false negative). Reduced by increasing sample size. In medicine: missing a real drug effect. |
| Statistical Power | 1 − β | Probability of correctly rejecting a false H₀. Target ≥ 0.80. Increases with larger n and larger true effect size. |
| Standard Error | SE = σ/√n or s/√n | Standard deviation of the sampling distribution of x̄. Measures precision of the sample mean estimate. |
| Confidence Interval | x̄ ± z* × SE | Range of plausible values for the population parameter. A 95% CI corresponds to a two-tailed α = 0.05 test. |
Practice Problems
Work through these problems before checking the answers. Each one uses the 6-step framework from Section 4.
Beginner Level
H₀: μ = 16 oz | H₁: μ ≠ 16 oz (two-tailed)
t = (15.6 − 16) / (0.8 / √10) = −0.4 / 0.253 = −1.58
df = 9. Critical value at α = 0.05, two-tailed: t* = ±2.262
|−1.58| < 2.262 → Fail to reject H₀.
Conclusion: At α = 0.05, there is insufficient evidence to conclude the mean fill differs from 16 oz. The sample of 10 is small and the result is inconclusive — a larger sample would be needed.
H₀: p = 0.50 | H₁: p ≠ 0.50 (two-tailed)
p̂ = 58/100 = 0.58
z = (0.58 − 0.50) / √(0.50 × 0.50 / 100) = 0.08 / 0.05 = 1.60
Critical value: z* = ±1.96. |1.60| < 1.96 → Fail to reject H₀.
p-value ≈ 0.110 > 0.05.
Conclusion: At α = 0.05, the 100-flip sample does not provide sufficient evidence to conclude the coin is biased. 58 heads in 100 flips is within the expected range of variation for a fair coin.
Intermediate Level
H₀: μ ≥ 500 | H₁: μ < 500 (left-tailed; we test whether evidence contradicts the claim)
z = (492 − 500) / (24 / √36) = −8 / 4 = −2.00
Critical value at α = 0.01 (left-tail): z* = −2.326
−2.00 > −2.326 → Fail to reject H₀ at α = 0.01.
p-value = 0.023. At α = 0.05 we would reject H₀; at α = 0.01 we do not.
Conclusion: At the stricter α = 0.01 significance level, there is insufficient evidence to reject the factory's claim. At α = 0.05, the evidence would be significant — illustrating how the choice of α changes the conclusion.
H₀: μ ≤ 78 | H₁: μ > 78 (right-tailed)
t = (82 − 78) / (10 / √25) = 4 / 2 = 2.00
df = 24. Critical value at α = 0.05 (right-tail): t* = 1.711
2.00 > 1.711 → Reject H₀. p-value ≈ 0.028.
Conclusion: At α = 0.05, there is sufficient evidence to conclude the new teaching method produces higher scores than the national mean of 78. The difference of 4 points is statistically significant with this sample.
Advanced Level
H₀: p ≤ 0.08 | H₁: p > 0.08 (right-tailed)
Verify: n × p₀ = 1200 × 0.08 = 96 ≥ 10 ✓
z = (0.10 − 0.08) / √(0.08 × 0.92 / 1200) = 0.02 / √(0.0000613) = 0.02 / 0.00783 ≈ 2.55
Critical value: z* = 1.645. 2.55 > 1.645 → Reject H₀. p-value ≈ 0.0054.
Conclusion: At α = 0.05, the new layout significantly increases click-through rate above 8%.
Error type: Rejecting H₀ when the true rate is still 8% would be a Type I error (false positive) — concluding the new design works when it actually does not. The probability of this error is α = 0.05.
Continue Learning: Related Topics
Hypothesis testing connects deeply to several other areas of statistics fundamentals. The guides below build directly on what you have learned here.
One-Sample T-Test
Deep dive into the t-test for a single mean: assumptions, degrees of freedom, and interpreting output from statistical software. The natural follow-on from this guide.
Two-Sample T-Test
Compare means from two independent groups. Used when you have two separate samples and want to know if they came from populations with the same mean.
Confidence Intervals
The interval estimation counterpart to hypothesis testing. A 95% confidence interval corresponds exactly to a two-tailed α = 0.05 test — if the null value falls outside the CI, you reject H₀.
Z-Score and the Normal Distribution
Z-scores underpin z-test critical values and p-value lookups. Understanding the standard normal distribution makes everything in this guide click more clearly.
Continue With These Pages
Hypothesis Testing Overview · Hypothesis Testing Examples · ANOVA · Chi-Square Test · Paired Samples T-Test · T-Test Calculator · Z-Score Calculator · Normal Distribution · Sampling Distributions · Statistics Glossary
Authoritative References
MIT OpenCourseWare: Statistics for Applications — covers hypothesis testing rigorously with problem sets. · Khan Academy: Significance Tests — free video lessons on null and alternative hypotheses. · StatTrek Hypothesis Testing Reference — concise formulas and critical value tables. · Nature Methods: Importance of Being Uncertain — explains p-values and errors for life sciences researchers.