Hypothesis Testing Statistical Tests 16 min read June 12, 2026
BY: Statistics Fundamentals Team
Reviewed By: Minsa A (Senior Statistics Editor)

One-Tailed vs Two-Tailed Tests: The Complete Guide

A two-tailed test asks "is there any difference at all?" A one-tailed test asks "did it specifically go up?" or "did it specifically go down?" Same data, same formulas — but the choice changes your critical values, your p-value, and sometimes your final decision.

This guide covers the formal definitions of one-sided and two-sided hypotheses, how alpha gets placed in the distribution for each, a step-by-step decision framework, and four fully worked examples covering medical, marketing, psychology, and manufacturing scenarios. A built-in calculator at the bottom shows how the same numbers produce different p-values and conclusions depending on tail choice.

What You'll Learn
  • ✓ The formal definitions of one-tailed and two-tailed hypotheses
  • ✓ How alpha is split or concentrated, with rejection-region diagrams
  • ✓ A step-by-step framework for choosing the right test direction
  • ✓ Four fully worked examples — medical, A/B testing, psychology, manufacturing
  • ✓ Critical values and p-value conversion tables for z-tests and t-tests
  • ✓ Why switching tails after seeing your data invalidates the result
  • ✓ An interactive calculator for comparing one-tailed and two-tailed outcomes

One-Tailed vs Two-Tailed Tests: Definitions

Definition — Two-Tailed Test
A two-tailed test (also called a non-directional or two-sided test) checks whether a parameter is different from a specified value, in either direction. The alternative hypothesis uses ≠, and the significance level α is split evenly between the upper and lower tails of the distribution.
H₀: μ = μ₀   |   Hₐ: μ ≠ μ₀
Definition — One-Tailed Test
A one-tailed test (also called a directional or one-sided test) checks whether a parameter is greater than, or less than, a specified value — never both. The alternative hypothesis uses > or <, and the entire significance level α sits in a single tail of the distribution.
Right-tailed: Hₐ: μ > μ₀   |   Left-tailed: Hₐ: μ < μ₀

The "tail" being referenced is the tail of the sampling distribution of the test statistic — the standard normal curve for a z-test, or the t-distribution for a t-test. The rejection region is the set of test-statistic values extreme enough to make you reject H₀. In a two-tailed test, that region exists on both ends of the curve; in a one-tailed test, it exists on only one end, and the other end is irrelevant no matter how extreme a result falls there.

Both versions can be applied to almost any test covered in the broader hypothesis testing framework: z-tests, t-tests, proportion tests, and correlation tests all have one-tailed and two-tailed forms. ANOVA and chi-square tests are exceptions — they are inherently one-tailed in a different sense, since their test statistics (F and χ²) cannot be negative, so "direction" doesn't apply the same way. This page focuses on z-tests and t-tests, where the choice is most commonly made and most commonly misapplied.

⚡ Quick Reference — One vs Two Tailed Key Facts
  • Two-tailed Hₐ: μ ≠ μ₀ — significant if the sample mean is much higher OR much lower than μ₀
  • One-tailed Hₐ: μ > μ₀ (right-tailed) or μ < μ₀ (left-tailed) — significant only in that direction
  • Alpha placement: Two-tailed splits α into α/2 per tail. One-tailed places all of α in one tail
  • Critical value (z, α = 0.05): Two-tailed = ±1.96. One-tailed = 1.645 (right) or −1.645 (left)
  • p-value relationship: Two-tailed p = 2 × one-tailed p, for the matching direction
  • Decision rule: Pick the tail count from your research question BEFORE collecting data — never after

Visualizing Rejection Regions

The clearest way to see the difference is to look at where the shaded "reject H₀" area sits on the curve. At α = 0.05, a two-tailed test shades 2.5% of the area in each tail (total 5%), while a one-tailed test shades the full 5% in just one tail. The diagrams below use the standard normal distribution and the familiar z = ±1.96 / z = 1.645 critical values.

Two-Tailed Test (α = 0.05)

−1.96 0 +1.96 α/2 = .025 α/2 = .025
Reject H₀ if z < −1.96 or z > +1.96. Hₐ: μ ≠ μ₀ — either extreme counts.

One-Tailed Test, Right (α = 0.05)

0 +1.645 α = .05
Reject H₀ only if z > +1.645. Hₐ: μ > μ₀ — a low result, however extreme, cannot reject H₀.

One-Tailed Test, Left (α = 0.05)

0 −1.645 α = .05
Reject H₀ only if z < −1.645. Hₐ: μ < μ₀ — a high result, however extreme, cannot reject H₀.
⚠️
The Critical Value Trade-Off

Notice the right-tailed critical value (1.645) is smaller than the two-tailed one (1.96). That makes a one-tailed test easier to pass — IF the effect goes the predicted direction. If it goes the other way, the one-tailed test cannot reject H₀ at all, even if z = −4.0. The two-tailed test would catch that result; the one-tailed test would not.

How to Choose: A Step-by-Step Framework

📋
Featured Snippet — Choosing One vs Two Tailed

Use a two-tailed test by default. Switch to a one-tailed test only if (1) your research question is explicitly directional, (2) that direction was decided before seeing any data, and (3) a result in the opposite direction would be practically meaningless or treated identically to "no effect." If any of those three conditions isn't met, stay two-tailed.

1

Write Your Research Question in Plain Language First

Before touching symbols, write out what you actually want to know. "Did the new layout change the conversion rate?" is non-directional — write H₀: p = p₀, Hₐ: p ≠ p₀ (two-tailed). "Does the new fertilizer increase yield compared to the old one?" is directional — write H₀: μ ≤ μ₀, Hₐ: μ > μ₀ (one-tailed, right).

2

Ask Whether the Opposite Direction Would Matter

If a new drug could plausibly make the condition worse, you need to detect that — a two-tailed test. If a new packaging process could only plausibly speed up or have no effect on delivery time, never slow it down in a meaningful way, a one-tailed right test may be defensible. When in doubt, the safer choice is two-tailed, because it remains valid for detecting effects in either direction.

3

Confirm the Direction Comes From Theory, Not From Peeking

A one-tailed test is only valid if the direction was specified in the study design — ideally in a pre-registration — before any data was collected or examined. Looking at a two-tailed result, noticing it's "almost significant" in one direction, and then re-running as one-tailed to halve the p-value is a form of p-hacking and invalidates the stated α.

4

Write H₀ and Hₐ Formally

Two-tailed: H₀: μ = μ₀, Hₐ: μ ≠ μ₀. Right-tailed: H₀: μ ≤ μ₀, Hₐ: μ > μ₀. Left-tailed: H₀: μ ≥ μ₀, Hₐ: μ < μ₀. Note that for one-tailed tests, H₀ technically includes the "≤" or "≥" — but in practice the test is carried out as if H₀: μ = μ₀, since the boundary case is what determines the critical value.

5

Place Alpha and Find the Critical Value

Two-tailed at α = 0.05: critical z = ±1.96, or use the z-table for the t-distribution equivalent at your degrees of freedom. One-tailed at α = 0.05: critical z = 1.645 (right) or −1.645 (left). For t-tests, consult the t-distribution table, which typically lists both one-tailed and two-tailed columns for the same df.

6

Calculate, Compare, and Report Both the Direction and the Decision

Compute the test statistic exactly as you would for any test (see the full hypothesis testing examples for the underlying formulas). Find the matching p-value for the number of tails you specified. If p < α, reject H₀ — and state explicitly in your write-up that the test was one-tailed (and why) or two-tailed, since this materially affects how a reader should interpret the result.

Worked Examples — One-Tailed vs Two-Tailed

The four examples below each show the same underlying scenario type analyzed both ways, so you can see exactly how the choice of tails changes the critical value, the p-value, and sometimes the final decision. Formulas follow the conventions used throughout the statistics and probability section of Statistics Fundamentals, and critical values are cross-checked against the NIST Engineering Statistics Handbook.

Example 1 — Medical Treatment (One-Sample Z-Test)

Worked Example 1 — Medical Treatment

Problem: A standard medication lowers systolic blood pressure by an average of 10 mmHg, with a known population standard deviation σ = 4 mmHg. A new formulation is tested on n = 36 patients and produces a mean reduction of x̄ = 11.2 mmHg. At α = 0.05, is there evidence the new formulation produces a different — or specifically larger — reduction?

One-Sample Z-Test Formula
z = (x̄ − μ₀) / (σ / √n)
= sample mean reduction μ₀ = baseline reduction σ = known population SD n = sample size
1

Test statistic (shared by both versions):
SE = σ/√n = 4/√36 = 4/6 = 0.667
z = (11.2 − 10) / 0.667 = 1.2 / 0.667 = 1.80

2

Two-tailed version: H₀: μ = 10  |  Hₐ: μ ≠ 10. Critical z = ±1.96. p = 2 × P(Z > 1.80) = 2 × 0.0359 = 0.0718.
p = 0.0718 ≥ 0.05 → Fail to Reject H₀ (z = 1.80 < 1.96).

3

One-tailed (right) version: H₀: μ ≤ 10  |  Hₐ: μ > 10 — appropriate only if the research question was specifically "does the new formulation lower blood pressure more than the standard?", decided before the trial. Critical z = 1.645. p = P(Z > 1.80) = 0.0359.
p = 0.0359 < 0.05 → Reject H₀ (z = 1.80 > 1.645).

The same z = 1.80 produces opposite decisions: two-tailed fails to reject H₀ (p = 0.072), one-tailed right rejects it (p = 0.036). This is the textbook case for why the tail choice must be locked in before the trial — choosing one-tailed after seeing a "promising but not quite significant" two-tailed result would be inappropriate.

Critical values and p-values cross-checked against the NIST Standard Normal Probability Table.

Example 2 — A/B Campaign Performance (Two-Sample T-Test)

Worked Example 2 — A/B Campaign Performance

Problem: A marketing team redesigns a checkout page. The old page (Group A, n₁ = 25) averages x̄₁ = 4.10 minutes to complete checkout with s₁ = 1.2. The new page (Group B, n₂ = 25) averages x̄₂ = 3.55 minutes with s₂ = 1.1. At α = 0.05, did checkout time change — and separately, did it specifically decrease?

Two-Sample T-Test Formula (Equal Variances)
t = (x̄₁ − x̄₂) / √(sₚ²(1/n₁ + 1/n₂))
sₚ² = pooled variance df = n₁ + n₂ − 2 = 48
1

Test statistic (shared):
sₚ² = [(24×1.2²) + (24×1.1²)] / 48 = [34.56 + 29.04] / 48 = 1.325
SE = √(1.325 × (1/25 + 1/25)) = √(1.325 × 0.08) = √0.106 = 0.3256
t = (4.10 − 3.55) / 0.3256 = 0.55 / 0.3256 = 1.69 (df = 48)

2

Two-tailed version: H₀: μ₁ = μ₂  |  Hₐ: μ₁ ≠ μ₂. Critical t ≈ ±2.011 (df=48). Two-tailed p ≈ 0.097.
p = 0.097 ≥ 0.05 → Fail to Reject H₀.

3

One-tailed (left, on x̄₁−x̄₂) version: H₀: μ₁ ≤ μ₂  |  Hₐ: μ₁ > μ₂ — testing specifically "does the new page reduce checkout time?", pre-specified before the test ran. Critical t ≈ 1.677 (df=48). One-tailed p ≈ 0.049.
p = 0.049 < 0.05 → Reject H₀.

Same data, two conclusions. The two-tailed test (the safer default for "did anything change?") does not reach significance at α = 0.05. The one-tailed test, framed around the specific directional question "did the redesign make checkout faster?" and decided in advance, narrowly clears the threshold. Teams that run many A/B tests should standardize on two-tailed unless there's a documented, pre-registered reason for a directional hypothesis.

Example 3 — Academic Psychology Experiment (One-Sample T-Test)

Worked Example 3 — Reaction Time Study

Problem: Prior research establishes that, under normal conditions, participants complete a reaction-time task with a mean of μ₀ = 450 ms. A researcher hypothesizes that mild caffeine intake speeds up reaction time. A sample of n = 16 caffeinated participants produces x̄ = 423 ms with s = 48 ms. Test at α = 0.05.

One-Sample T-Test Formula
t = (x̄ − μ₀) / (s / √n)
s = sample standard deviation df = n − 1 = 15
1

Test statistic:
SE = 48/√16 = 48/4 = 12
t = (423 − 450) / 12 = −27 / 12 = −2.25 (df = 15)

2

One-tailed (left) version — chosen because the literature specifically predicts a decrease: H₀: μ ≥ 450  |  Hₐ: μ < 450. Critical t = −1.753 (df=15, one-tailed α=0.05). p ≈ 0.020.
|t| = 2.25 > 1.753 and p = 0.020 < 0.05 → Reject H₀.

3

For comparison, two-tailed version: H₀: μ = 450  |  Hₐ: μ ≠ 450. Critical t = ±2.131 (df=15). p ≈ 0.040.
p = 0.040 < 0.05 → Reject H₀ as well, since |t| = 2.25 > 2.131.

Both versions reject H₀ here, but notice the margins: the two-tailed p-value (0.040) is close to α, while the one-tailed p-value (0.020) has more headroom. With a pre-registered directional hypothesis backed by prior literature, this is a legitimate case for a one-tailed test — caffeine's effect on reaction time has a well-established direction in the existing research base.

Example 4 — Manufacturing Quality Control (One-Sample Z-Test)

Worked Example 4 — Bolt Diameter Specification

Problem: A bolt is specified to have a diameter of μ₀ = 12.00 mm with a known process standard deviation σ = 0.05 mm. A quality inspector samples n = 49 bolts from a new supplier and measures x̄ = 11.985 mm. At α = 0.05, has the new supplier's process drifted off the 12.00 mm target — and separately, is it producing undersized bolts?

One-Sample Z-Test Formula
z = (x̄ − μ₀) / (σ / √n)
= sample mean diameter μ₀ = target diameter σ = process SD
1

Test statistic:
SE = 0.05/√49 = 0.05/7 = 0.007143
z = (11.985 − 12.00) / 0.007143 = −0.015 / 0.007143 = −2.10

2

Two-tailed version (standard QC check — "is the process off-target in either direction?"): H₀: μ = 12.00  |  Hₐ: μ ≠ 12.00. Critical z = ±1.96. p = 2 × P(Z > 2.10) = 2 × 0.0179 = 0.0358.
p = 0.0358 < 0.05 → Reject H₀.

3

One-tailed (left) version — relevant if oversized bolts are harmless but undersized bolts fail a fit check: H₀: μ ≥ 12.00  |  Hₐ: μ < 12.00. Critical z = −1.645. p = P(Z < −2.10) = 0.0179.
p = 0.0179 < 0.05 → Reject H₀.

Both reject H₀ in this case, but for different reasons. Two-tailed QC monitoring is the standard approach because both over- and under-sized bolts typically matter. The one-tailed framing would only be defensible if oversized bolts genuinely posed no risk for this specific application — a judgment that belongs in the engineering specification, not in the statistical analysis after the fact.

One vs Two Tailed: Side-by-Side Comparison

Core Differences

Factor One-Tailed Test Two-Tailed Test
Alternative hypothesis (Hₐ)μ > μ₀ or μ < μ₀ (directional)μ ≠ μ₀ (non-directional)
Rejection regionAll of α in one tailα/2 in each tail
Critical value (z, α=0.05)1.645 (right) or −1.645 (left)±1.96
p-value (same statistic)Smaller — half the two-tailed value in the matching directionLarger — exactly 2 × the one-tailed value
Statistical power for predicted directionHigherLower
Can detect effect in opposite direction?No — by designYes
Direction must be setBefore data collectionNot applicable
Typical default in researchUsed only with strong prior justificationThe standard default

P-Value and Critical Value Conversion

Test Statistic One-Tailed p-value Two-Tailed p-value
z = 1.6450.05000.1000
z = 1.800.03590.0718
z = 1.960.02500.0500
z = 2.100.01790.0358
z = 2.210.01360.0272
z = 2.5760.00500.0100
The Halving Rule

For any test statistic, two-tailed p-value = 2 × one-tailed p-value (in the direction matching the sign of the statistic). This relationship holds for z-tests, t-tests, and most other tests built on symmetric distributions, and is the fastest way to sanity-check software output.

Real-World Applications by Industry

The same one-tailed vs two-tailed decision recurs across fields. The examples below summarize how each industry typically defaults, and why.

💊

Clinical Trials

Regulatory guidance (FDA, EMA) generally expects two-tailed tests for efficacy claims, because a treatment that performs worse than placebo is just as important to detect as one that performs better. One-tailed tests in trials require explicit pre-specification and justification in the protocol.

🖥️

A/B and Conversion Testing

Most A/B testing platforms default to two-tailed proportion or t-tests, since a redesign could plausibly hurt conversion as easily as help it. One-tailed tests appear mainly in internal "did our specific fix work?" checks where a regression would be caught by other monitoring anyway.

🏭

Manufacturing & QC

Two-tailed tests are standard for monitoring whether a process mean has drifted from a target specification in either direction. One-tailed tests appear in specific pass/fail checks — for example, "is the minimum tensile strength above the safety threshold?" — where only one direction constitutes a failure.

🧠

Psychology & Social Science

One-tailed tests appear more often here than in medicine, typically when a large body of prior research already establishes a direction (as in the caffeine/reaction-time example above). Pre-registration platforms now require researchers to declare tail direction before data collection.

📈

Finance & Economics

Two-tailed tests dominate when testing whether a regression coefficient differs from zero, since both positive and negative relationships are economically meaningful. One-tailed tests occasionally appear in tests of a specific theoretical prediction, such as "does this policy reduce, not just change, unemployment?"

🤖

Machine Learning Evaluation

When comparing a new model to a baseline on the same validation folds with a paired t-test, two-tailed is the safer default — a new architecture can underperform a baseline. One-tailed tests sometimes appear when the new model is a strict superset of the old one (so it cannot, in principle, do worse).

One vs Two Tailed Cheat Sheet

Formula and Hypothesis Summary

Test TypeH₀HₐRejection Rule
Two-tailedμ = μ₀μ ≠ μ₀Reject if |stat| > critical value
Right-tailedμ ≤ μ₀μ > μ₀Reject if stat > +critical value
Left-tailedμ ≥ μ₀μ < μ₀Reject if stat < −critical value

Critical Values at Common Alpha Levels (z-distribution)

αTwo-Tailed Critical zOne-Tailed Critical z
0.10±1.6451.282
0.05±1.9601.645
0.01±2.5762.326

For t-distributions, the equivalent critical values depend on degrees of freedom — the full t-distribution table lists both one-tailed and two-tailed columns side by side for each df, which is the fastest way to look up the exact value for your sample size.

Interactive One vs Two Tailed Calculator

Enter your sample statistics once, then compare how the p-value and decision change across two-tailed, right-tailed, and left-tailed framings of the same data. This reuses the same z-test and t-test engine as the main hypothesis testing examples calculator.

🔬 One-Sample Z-Test / T-Test — Tail Comparison

Common Mistakes

  • Choosing tails after seeing the data: If a two-tailed result is "close" and switching to one-tailed would push it under α, that switch is not valid — the direction must be decided before data collection, full stop.
  • Treating a non-significant two-tailed result as proof of "no effect": A one-tailed test in the correct direction might still be significant if it was genuinely justified in advance — but retroactively justifying it is the problem above, not a solution.
  • Forgetting that a one-tailed test is blind to the opposite direction: A right-tailed test that produces z = −3.5 still fails to reject H₀, even though −3.5 looks extreme. The test was never designed to detect decreases.
  • Assuming "directional theory" means "any direction I'd like to find": Genuine directional justification comes from established prior research, mechanism, or a situation where one direction is the only one that matters operationally — not from a researcher's hope about which way the data will go.
  • Reporting only the p-value without stating the tail count: p = 0.04 means something different depending on whether it's one-tailed or two-tailed. Always state which was used and why.

Frequently Asked Questions

A one-tailed test checks for an effect in a single, predetermined direction (greater than or less than) and puts all of α in that one tail. A two-tailed test checks for a difference in either direction and splits α across both tails (α/2 each). One-tailed tests are more powerful for the predicted direction; two-tailed tests are the standard default in most research.

Use a one-tailed test only when the direction of the expected effect is specified before data collection, based on theory, prior studies, or a situation where a result in the opposite direction would be treated identically to "no effect" at all. The direction must be chosen before seeing the results — choosing it afterward invalidates the test.

Yes, for an effect in the predicted direction. Because all of α sits in one tail rather than being split, the critical value is less extreme (for example, z = 1.645 instead of 1.96 at α = 0.05), making it easier to reach significance when the effect goes the predicted way. The trade-off is that a one-tailed test in the wrong direction cannot reject H₀ at all, no matter how extreme the result.

No. Choosing the tail type after looking at the data is a form of p-hacking that inflates the true Type I error rate above the stated α. The number of tails must be specified in the hypotheses before any data is collected or analyzed — ideally documented in a pre-registration.

Direction determines two things: where the rejection region sits on the distribution, and what the p-value means. A directional (one-tailed) Hₐ concentrates α in one tail and produces a smaller p-value for an effect in that direction, but zero ability to detect an effect in the other direction. A non-directional (two-tailed) Hₐ splits α and can detect effects either way, at the cost of needing a more extreme result to reach the same α.

Example: a quality control check on bolt diameter, H₀: μ = 12.00 mm, Hₐ: μ ≠ 12.00 mm. Both oversized and undersized bolts are problems, so the test must be able to flag a drift in either direction. This is Example 4 above, worked in full with z = −2.10 and a two-tailed p-value of 0.0358.

A one-tailed hypothesis states the alternative as strictly greater than (Hₐ: μ > μ₀, right-tailed) or strictly less than (Hₐ: μ < μ₀, left-tailed) the null value, never both. Example: H₀: μ ≤ 30 minutes, Hₐ: μ > 30 minutes, for testing whether a new packing process increases average delivery time compared to a 30-minute baseline. This is right-tailed because the alternative hypothesis only claims an increase.

For the same test statistic, the two-tailed p-value is exactly twice the one-tailed p-value in the corresponding direction. For example, if the one-tailed p-value for z = 2.21 is 0.0136, the two-tailed p-value is 2 × 0.0136 = 0.0272. This is why a result can be significant one-tailed but not two-tailed at the same α — see the conversion table above for more values.

Sources and References

This guide draws on the following primary and secondary sources. Critical values and formulas are cross-referenced against NIST's Engineering Statistics Handbook, the most widely cited government reference for applied statistics.

  • NIST Engineering Statistics HandbookHypothesis Testing. National Institute of Standards and Technology. itl.nist.gov
  • Penn State STAT 415Introduction to Mathematical Statistics. Penn State Eberly College of Science. online.stat.psu.edu
  • OpenStax Introductory Statistics — Ch. 9–10: Hypothesis Testing. Rice University. openstax.org
  • Stanford Encyclopedia of PhilosophySignificance Tests entries discuss the Fisher and Neyman–Pearson traditions underlying directional vs non-directional testing. plato.stanford.edu
  • Neyman, J. & Pearson, E.S. (1933) — "On the Problem of the Most Efficient Tests of Statistical Hypotheses." Philosophical Transactions of the Royal Society A, 231, 289–337.
  • Fisher, R.A. (1925)Statistical Methods for Research Workers. Edinburgh: Oliver and Boyd.