What is a Type I error?

A Type I error occurs when a true null hypothesis is rejected. It is also called a false positive. The probability of committing a Type I error equals the significance level, α.

What is a Type II error?

A Type II error occurs when a false null hypothesis is not rejected. It is also called a false negative. The probability of committing a Type II error is denoted β (beta).

What is the difference between Type I and Type II errors?

A Type I error (false positive) means rejecting a null hypothesis that is actually true. A Type II error (false negative) means failing to reject a null hypothesis that is actually false. Reducing one type of error generally increases the other, given a fixed sample size.

Which error is more serious — Type I or Type II?

It depends on context. In medical testing, a Type II error (missing a real disease) can be life-threatening. In judicial decisions, a Type I error (convicting an innocent person) is considered worse. The researcher must weigh the costs of each type of mistake before choosing α.

How are Type I and Type II errors related to statistical power?

Statistical power equals 1 − β. Increasing power means reducing β (fewer Type II errors). However, decreasing α (fewer Type I errors) reduces power. The two are in direct tension; the usual solution is to increase sample size, which reduces both error probabilities simultaneously.

How do you reduce Type I and Type II errors?

To reduce Type I errors: lower α (e.g., from 0.05 to 0.01). To reduce Type II errors: increase sample size, increase α, or increase effect size. To reduce both simultaneously: increase sample size. A power analysis before the study determines the sample size needed to achieve a target power (usually 0.80) at a given α.

Type I and Type II Errors in Hypothesis Testing: Complete Guide (2026)

What Are Type I and Type II Errors?

Definition — Type I and Type II Errors

A Type I error (α) occurs when a true null hypothesis is rejected — a false positive. A Type II error (β) occurs when a false null hypothesis is not rejected — a false negative.

Type I: Reject true H₀ → P = α

Type II: Keep false H₀ → P = β

Every hypothesis test results in one of four outcomes. Either the null hypothesis H₀ is true or it is false, and either you reject it or you do not. Two of those four outcomes are correct decisions. The other two are errors — and those errors have names.

These two error types were formalized by statisticians Jerzy Neyman and Egon Pearson in their landmark 1933 paper as part of the Neyman–Pearson decision framework. Their insight was that a test should be designed not just to detect effects, but to control the rates of both kinds of mistakes. The complete context lives in the hypothesis testing reference on Statistics Fundamentals.

Type I Error — Definition

A Type I error occurs when the data leads you to reject H₀, but H₀ is actually true. You have detected an effect that does not exist. In diagnostic language, this is a false positive.

The probability of a Type I error is exactly equal to the significance level α that you set before running the test. If you set α = 0.05, then in repeated testing under a true null hypothesis, 5% of your tests will produce a false positive purely by chance. Choosing a smaller α (say, 0.01) makes Type I errors rarer but does not eliminate them.

Type II Error — Definition

A Type II error occurs when the data does not give enough evidence to reject H₀, but H₀ is false — a real effect exists and the test missed it. In diagnostic language, this is a false negative.

The probability of a Type II error is denoted β (beta). It depends on the significance level, the sample size, the actual effect size, and the variability in the data. Unlike α, β is not set directly; it is determined by these factors combined. The complement of β is statistical power: Power = 1 − β.

⚡ Quick Reference — Type I vs Type II Errors

Type I error (false positive): Reject H₀ when H₀ is true. Probability = α
Type II error (false negative): Fail to reject H₀ when H₀ is false. Probability = β
Correct rejection: Reject H₀ when H₀ is false. Probability = 1 − β = Power
Correct retention: Fail to reject H₀ when H₀ is true. Probability = 1 − α = Specificity
Memory rule: Type I = "false alarm." Type II = "missed detection."

The 2×2 Decision Matrix

Every statistical decision falls into one of four cells in this matrix. The rows represent your decision; the columns represent reality. Two cells are correct outcomes and two cells are errors.

	H₀ Is True (No real effect)	H₀ Is False (Real effect exists)
Reject H₀	❌ Type I Error False Positive P = α	✅ Correct Decision True Positive P = 1 − β = Power
Fail to Reject H₀	✅ Correct Decision True Negative P = 1 − α	⚠️ Type II Error False Negative P = β

Reading the table: the left column shows what happens when there truly is no effect. The right column shows what happens when a real effect exists. In the left column, you want to "fail to reject H₀" (bottom cell). In the right column, you want to "reject H₀" (top cell). The two error cells are the ones where your decision does not match reality.

Type I Error Rate
P(Reject H₀ | H₀ true)

Type II Error Rate
P(Keep H₀ | H₀ false)

1−β

Statistical Power
P(Reject H₀ | H₀ false)

1−α

Specificity
P(Keep H₀ | H₀ true)

Type I vs Type II Errors — Comparison

Feature	Type I Error	Type II Error
Alternative name	False positive	False negative
What happens	Reject a true null hypothesis	Fail to reject a false null hypothesis
Symbol	α (alpha)	β (beta)
Directly controlled by	The chosen significance level	Sample size, effect size, α, variability
Typical default value	α = 0.05	β = 0.20 (Power = 0.80)
Reduced by	Lowering α	Increasing n, increasing α, larger effect size
Analogy	Convicting an innocent person	Acquitting a guilty person
In medicine (screening)	Diagnosing healthy patient as sick	Missing a disease in a sick patient
Relationship to power	—	Power = 1 − β
Effect of increasing α	Increases	Decreases
Effect of larger n	Unchanged (still equals α)	Decreases β (raises power)

Alpha, Beta, and Statistical Power

The three quantities α, β, and power are not independent. Once you fix the significance level, the sample size, and the effect size, the value of β — and therefore power — is determined. Understanding how they interact is the foundation of research design.

Significance Level (α) — Type I Error Rate

α is the pre-set threshold you compare the p-value to. It is also the exact probability that a correct null hypothesis will be rejected by chance. Setting α = 0.05 means you accept a 5% rate of false positives across repeated tests under a true null. Fields with severe consequences of a false positive — particle physics, genomics — use α = 0.0001 or smaller. Exploratory research sometimes accepts α = 0.10.

Significance Level

α = P(Reject H₀ | H₀ is true)

α = 0.05 — conventional threshold in most research α = 0.01 — conservative (medicine, safety) α = 0.001 — very strict (genomics, physics)

Beta (β) — Type II Error Rate

β is the probability that a test fails to detect a real effect. A β of 0.20 means a 20% chance of missing an effect that genuinely exists. This is the conventional maximum, which corresponds to a power of 0.80 — the benchmark set by Jacob Cohen in his foundational 1988 book Statistical Power Analysis for the Behavioral Sciences.

Type II Error Rate

β = P(Fail to Reject H₀ | H₀ is false)

β = 0.20 — conventional maximum (Power = 0.80) β = 0.10 — high-stakes research (Power = 0.90) β depends on n, effect size, α, and σ

Statistical Power (1 − β)

Power is the probability of correctly rejecting a false null hypothesis. A test with power = 0.80 will detect a true effect 80% of the time across repeated studies. Power analysis before data collection calculates the sample size needed to reach your target power. The key inputs are α, the expected effect size, the population variance, and your target β.

Statistical Power

Power = 1 − β = P(Reject H₀ | H₀ is false)

Power ↑ when n increases Power ↑ when α increases Power ↑ when effect size increases Power ↓ when σ (noise) increases

📐

The Central Tradeoff

For a fixed sample size, lowering α (reducing Type I errors) raises the rejection threshold, which also makes it harder to detect real effects — so β increases and power falls. The only way to reduce both errors simultaneously is to collect more data. This is why power analysis belongs before a study begins, not after.

Type I and Type II Errors — 6 Worked Examples

Each example below identifies the null hypothesis, describes what each type of error would mean in context, and explains which error carries greater cost. These are the kinds of scenarios that appear on exams and in research practice.

Example 1 — Medical Screening (Cancer Test)

Worked Example 1 — Medical Screening

A hospital uses a blood test to screen patients for a rare cancer. The test has a 5% false positive rate and a 15% false negative rate.

H₀

Null hypothesis: The patient does not have cancer (H₀: no disease).

Type I error (false positive): The test says the patient has cancer, but they are healthy. The patient undergoes unnecessary follow-up procedures, anxiety, and possibly harmful treatment. α = 0.05 here.

Type II error (false negative): The test says the patient is healthy, but they actually have cancer. The disease goes untreated and may progress to a life-threatening stage. β = 0.15 here.

⚠️ Verdict: In cancer screening, a Type II error (missed cancer) is typically far more serious than a Type I error. This is why screening tests are often designed with low β even at the cost of more false positives — the follow-up tests filter those out.

Example 2 — Criminal Justice (Trial Analogy)

Worked Example 2 — Criminal Justice

A defendant is on trial. The jury must decide: guilty or not guilty.

H₀

Null hypothesis: The defendant is innocent (H₀: not guilty).

Type I error: The jury convicts an innocent person. This is considered the graver error in criminal law — "better that ten guilty persons escape than one innocent suffer" (Blackstone's ratio).

Type II error: The jury acquits a guilty person. The guilty party goes free. This is bad but considered less catastrophic than punishing the innocent.

✅ The criminal justice system sets α very low (high standard of proof — "beyond reasonable doubt") to minimize Type I errors, accepting a higher β as the cost. This is a deliberate societal choice about which error type to prioritize.

Example 3 — Drug Approval (Clinical Trial)

Worked Example 3 — Clinical Trial

A pharmaceutical company tests whether a new antidepressant reduces depression scores more than a placebo. The trial uses α = 0.05.

H₀

Null hypothesis: The drug has no effect compared to the placebo (H₀: μ_drug = μ_placebo).

Type I error: The trial concludes the drug works, but it does not. The FDA approves an ineffective drug. Patients pay for a useless treatment and miss out on effective alternatives. Real-world cost: very high.

Type II error: The trial finds no significant result, but the drug genuinely works. An effective treatment never reaches patients. Regulatory agencies like the FDA require high power (≥ 0.80) precisely to reduce this error.

✅ Both errors matter here. The FDA uses α = 0.05 as a Type I guard and mandates prospective power analyses to control β. An underpowered trial that fails to detect a real benefit harms patients just as much as approving a useless drug.

Source: FDA Guidance on Adaptive Designs (2019). FDA Statistical Guidance on Reporting Results from Studies Evaluating Diagnostic Tests.

Example 4 — Manufacturing Quality Control

Worked Example 4 — Quality Control

A factory produces bolts with a target diameter of 10 mm. Quality control samples 30 bolts per batch and tests H₀: μ = 10 mm at α = 0.01.

H₀

Null hypothesis: The machine is calibrated correctly and producing 10 mm bolts (H₀: μ = 10).

Type I error: The test rejects H₀ — the machine is shut down for recalibration — but the machine was working fine. Result: lost production time and labor cost with no defect problem.

Type II error: The test fails to reject H₀, but the machine has drifted. Defective bolts continue to ship. In safety-critical applications (aerospace, automotive), this can cause product failures.

✅ The factory uses α = 0.01 to minimize unnecessary shutdowns (Type I), but also ensures n = 30 gives adequate power. The appropriate balance depends on the cost of defects vs the cost of downtime.

Example 5 — A/B Testing (Digital Product)

Worked Example 5 — A/B Testing

An e-commerce company tests whether a new checkout button color increases conversion rate. They run an A/B test for two weeks with α = 0.05 and target power = 0.80.

H₀

Null hypothesis: The new button color has no effect on conversion rate (H₀: p_new = p_control).

Type I error: The test concludes the new color converts better, but the result was random noise. The company ships the change, wasting engineering resources on an ineffective update and possibly disrupting the user experience.

Type II error: The test finds no significant difference, but the new button genuinely converts better by 2%. The company misses revenue. This often happens when tests are stopped too early before reaching the planned sample size.

✅ A/B testing in product development commonly suffers from "peeking" — checking results before the planned sample size is reached. Stopping early inflates Type I error rates and prevents reaching the statistical power needed to detect small but real effects.

Example 6 — Psychology Research

Worked Example 6 — Psychology Study

A researcher tests whether a mindfulness intervention reduces anxiety scores. n = 40, α = 0.05, power analysis suggests power ≈ 0.65 — below the 0.80 standard.

H₀

Null hypothesis: Mindfulness training does not reduce anxiety (H₀: no treatment effect).

Type I error (α = 0.05): The study reports a significant anxiety reduction, but the effect was a fluke. This contributes to the replication crisis in psychology if the finding is published and other labs try and fail to replicate it.

Type II error (β ≈ 0.35): With power = 0.65, there is a 35% chance of missing a real treatment effect. The underpowered study may conclude "no effect" and shelve a genuinely helpful intervention.

⚠️ This study is underpowered. The researcher should either increase n to approximately 85 (to reach power = 0.80) or report the study as preliminary and interpret a null result cautiously. Publishing underpowered null results as "no effect" findings is a methodological error.

The Tradeoff Between Type I and Type II Errors

For a fixed dataset, reducing one type of error increases the other. This inverse relationship follows directly from the mechanics of hypothesis testing: the significance threshold determines both the boundary for rejection and how sensitive the test is to real effects.

⚖️

The Core Tension

If you lower α to reduce false positives, you move the rejection region further out — which means real effects near the boundary are no longer detected, and β increases. If you raise α to catch more real effects, you also catch more false ones. Given fixed n, you cannot minimize both simultaneously.

How to Reduce Both Errors

The way out of the tradeoff is larger samples. With more data, the sampling distribution of the test statistic narrows, which means the rejection region can be placed further out (lower α) while still overlapping more with the distribution under H₁ — so β falls and power rises. This is why power analysis begins with your target α and desired power and solves for the required sample size.

📉

Lower α (e.g., 0.05 → 0.01)

Fewer Type I errors. Type II errors increase. Power falls. Use when false positives are costly.

📈

Increase Sample Size n

Reduces both Type I and Type II errors by narrowing sampling variability. The primary lever in power analysis.

🔬

Increase Effect Size

Larger effects are easier to detect, so β falls. Achieved by stronger treatments, purer populations, or better measurement instruments.

📐

Reduce Variability (σ)

Tighter measurement and controlled conditions reduce noise, improving the signal-to-noise ratio and lowering β.

Power and Beta Calculator

This calculator estimates statistical power (1 − β) and the Type II error rate (β) for a one-sample z-test. Enter your significance level, sample size, expected effect size, and population standard deviation. The result tells you the probability of detecting a real effect of the given magnitude.

Type II Error & Power Calculator

Significance Level (α)

Sample Size (n)

Effect Size (μ₁ − μ₀)

Population SD (σ)

Test Type

Which Error Is Worse — Type I or Type II?

There is no universal answer. The relative severity of each error depends entirely on the consequences of being wrong in a particular direction. The researcher — not the statistician — makes this judgment before choosing α.

Domain	Type I Error	Type II Error	Which Is Worse?
Cancer screening	Treat a healthy patient	Miss a real cancer	Type II (life-threatening)
Criminal law	Convict the innocent	Acquit the guilty	Type I (legal principle)
Drug approval	Approve an ineffective drug	Reject an effective drug	Context-dependent
Spam filtering	Block a legitimate email	Let spam through	Type I (miss important mail)
A/B testing	Ship a useless change	Miss a real improvement	Depends on cost of shipping
Nuclear plant safety	Shut down a safe plant	Miss a real fault	Type II (safety critical)
Pregnancy test	False positive (says pregnant)	False negative (misses pregnancy)	Context-dependent

How to Remember Type I and Type II Errors

🧠 Memory Device 1 — "The Boy Who Cried Wolf"

Type I error = crying wolf (false alarm): The shepherd cries "wolf!" when there is none. The villagers react to a non-existent threat.

Type II error = missing the real wolf: When the wolf actually comes, no one believes the shepherd. The real threat goes undetected.

🧠 Memory Device 2 — Fire Alarm

Type I = fire alarm goes off when there is no fire. You evacuate for nothing — a false positive.

Type II = no alarm when the building is on fire. The real danger is missed — a false negative.

✅

Simple Keyword Rule

Type I = False alarm (you falsely declared something significant). Type II = Missed detection (you missed a real signal). Or: think of Type I as the "eager" error (too quick to reject) and Type II as the "lazy" error (not sensitive enough to detect).

Complete Reference Table — All Key Formulas

Term	Symbol	Formula / Value	Plain-Language Meaning
Type I Error	α	P(Reject H₀ \| H₀ true)	Rate of false positives; set by the researcher
Type II Error	β	P(Fail to reject H₀ \| H₀ false)	Rate of false negatives; depends on n, effect, σ
Statistical Power	1 − β	P(Reject H₀ \| H₀ false)	Probability of detecting a real effect
Specificity	1 − α	P(Fail to reject H₀ \| H₀ true)	Probability of a true negative result
Significance level	α	Typically 0.05	Pre-set threshold for p-value comparison
p-value	p	P(data ≥ observed \| H₀ true)	Evidence against H₀; reject if p < α
Effect size (Cohen's d)	d	(μ₁ − μ₀) / σ	Standardized magnitude of the true difference
Standard Error	SE	σ / √n	Precision of the sample mean; falls with larger n
Critical value (z, two-tailed, α=0.05)	z*	±1.96	Boundary of the rejection region
Non-centrality parameter	δ	(μ₁ − μ₀) / (σ/√n)	How far the true effect is from H₀ in SE units

Real-World Applications

The same logic of balancing false positives against false negatives applies across every domain that uses statistical inference. Recognizing which error carries greater harm in your context is the first step to designing an appropriate test.

🏥

Medical Diagnostics

Sensitivity (1 − β) and specificity (1 − α) are the clinical equivalents of power and significance level. Diagnostic tests are designed with known trade-offs between them.

💊

Clinical Trials

Phase III drug trials target power ≥ 0.80 at α = 0.05. The FDA evaluates both endpoints to ensure neither ineffective drugs get approved nor effective ones get missed.

📧

Spam Detection

Spam filters face the same tradeoff: block too aggressively (Type I: legitimate mail marked as spam) or too loosely (Type II: spam gets through). Most systems let users adjust the threshold.

🧬

Genomics / GWAS

Genome-wide association studies test millions of variants simultaneously, requiring α = 5×10⁻⁸ to control family-wise Type I error. This demands very large samples to maintain power.

🤖

Machine Learning

In binary classifiers, Type I error rate = false positive rate (1 − specificity) and Type II error rate = false negative rate (1 − recall/sensitivity). The ROC curve plots the full tradeoff.

⚙️

Process Control (SPC)

Control charts define warning limits (Type I) and detection ability (Type II). Tighter control limits catch more real shifts but also trigger more false alarms.

FAQs

A Type I error occurs when a true null hypothesis is rejected. The result is a false positive — the test concludes an effect exists when none does. The probability of this error equals α, the significance level set before data collection. At α = 0.05, 1 in 20 tests will produce a Type I error when H₀ is true, purely by chance.

A Type II error occurs when a false null hypothesis is not rejected — a false negative. The test fails to detect a real effect that exists in the population. The probability is β (beta). Standard practice targets β ≤ 0.20, meaning a power of at least 80%. β decreases when you increase sample size, use a larger α, or study a bigger effect.

A Type I error (α) is a false positive — you reject a correct null hypothesis. A Type II error (β) is a false negative — you fail to reject an incorrect null hypothesis. Both are mistakes, but in opposite directions. Reducing one increases the other when sample size is fixed. The only way to reduce both is increasing sample size.

Power = 1 − β. A test with high power has a low Type II error rate and detects real effects more reliably. Power increases with larger sample size, larger α, or stronger effects. However, lowering α reduces Type I errors but also reduces power (increases β). Power analysis helps determine required sample size.

It depends on context. In criminal justice, Type I error (convicting an innocent person) is more serious. In medical screening, Type II error (missing a disease) is often more dangerous. In research and industry, the cost of each error depends on consequences, not statistics alone.

Increasing sample size is the main way to reduce both. More data reduces variability, allowing better separation between null and alternative hypotheses. Better measurement quality and study design also help reduce both errors.

No. A single test outcome can only produce one type of error depending on whether the null hypothesis is true or false. Across multiple tests, both types can occur in different instances.

Type I error is α (alpha). Type II error is β (beta). Statistical power is 1 − β. These are standard symbols in hypothesis testing and decision theory.

In classification problems, Type I error is a false positive and Type II error is a false negative. These correspond to false positive rate and false negative rate. ROC curves visualize the tradeoff between them across thresholds.

Sources and Further Reading

Neyman, J. & Pearson, E.S. (1933) — "On the Problem of the Most Efficient Tests of Statistical Hypotheses." Philosophical Transactions of the Royal Society A, 231, 289–337. Foundational paper defining the two error types.
Cohen, J. (1988) — Statistical Power Analysis for the Behavioral Sciences (2nd ed.). Lawrence Erlbaum. Established the convention of Power = 0.80. Publisher page.
NIST/SEMATECH (2012) — e-Handbook of Statistical Methods. National Institute of Standards and Technology. NIST Handbook §7.4 — Type I and Type II Errors.
FDA (2019) — Adaptive Designs for Clinical Trials of Drugs and Biologics. U.S. Food and Drug Administration Guidance for Industry. fda.gov.
UCLA Statistics Consulting Group — Power Analysis for Research. UCLA Institute for Digital Research and Education. stats.oarc.ucla.edu.