One-Tailed vs Two-Tailed Tests: Definitions
The "tail" being referenced is the tail of the sampling distribution of the test statistic — the standard normal curve for a z-test, or the t-distribution for a t-test. The rejection region is the set of test-statistic values extreme enough to make you reject H₀. In a two-tailed test, that region exists on both ends of the curve; in a one-tailed test, it exists on only one end, and the other end is irrelevant no matter how extreme a result falls there.
Both versions can be applied to almost any test covered in the broader hypothesis testing framework: z-tests, t-tests, proportion tests, and correlation tests all have one-tailed and two-tailed forms. ANOVA and chi-square tests are exceptions — they are inherently one-tailed in a different sense, since their test statistics (F and χ²) cannot be negative, so "direction" doesn't apply the same way. This page focuses on z-tests and t-tests, where the choice is most commonly made and most commonly misapplied.
- Two-tailed Hₐ: μ ≠ μ₀ — significant if the sample mean is much higher OR much lower than μ₀
- One-tailed Hₐ: μ > μ₀ (right-tailed) or μ < μ₀ (left-tailed) — significant only in that direction
- Alpha placement: Two-tailed splits α into α/2 per tail. One-tailed places all of α in one tail
- Critical value (z, α = 0.05): Two-tailed = ±1.96. One-tailed = 1.645 (right) or −1.645 (left)
- p-value relationship: Two-tailed p = 2 × one-tailed p, for the matching direction
- Decision rule: Pick the tail count from your research question BEFORE collecting data — never after
Visualizing Rejection Regions
The clearest way to see the difference is to look at where the shaded "reject H₀" area sits on the curve. At α = 0.05, a two-tailed test shades 2.5% of the area in each tail (total 5%), while a one-tailed test shades the full 5% in just one tail. The diagrams below use the standard normal distribution and the familiar z = ±1.96 / z = 1.645 critical values.
Two-Tailed Test (α = 0.05)
One-Tailed Test, Right (α = 0.05)
One-Tailed Test, Left (α = 0.05)
Notice the right-tailed critical value (1.645) is smaller than the two-tailed one (1.96). That makes a one-tailed test easier to pass — IF the effect goes the predicted direction. If it goes the other way, the one-tailed test cannot reject H₀ at all, even if z = −4.0. The two-tailed test would catch that result; the one-tailed test would not.
How to Choose: A Step-by-Step Framework
Use a two-tailed test by default. Switch to a one-tailed test only if (1) your research question is explicitly directional, (2) that direction was decided before seeing any data, and (3) a result in the opposite direction would be practically meaningless or treated identically to "no effect." If any of those three conditions isn't met, stay two-tailed.
Write Your Research Question in Plain Language First
Before touching symbols, write out what you actually want to know. "Did the new layout change the conversion rate?" is non-directional — write H₀: p = p₀, Hₐ: p ≠ p₀ (two-tailed). "Does the new fertilizer increase yield compared to the old one?" is directional — write H₀: μ ≤ μ₀, Hₐ: μ > μ₀ (one-tailed, right).
Ask Whether the Opposite Direction Would Matter
If a new drug could plausibly make the condition worse, you need to detect that — a two-tailed test. If a new packaging process could only plausibly speed up or have no effect on delivery time, never slow it down in a meaningful way, a one-tailed right test may be defensible. When in doubt, the safer choice is two-tailed, because it remains valid for detecting effects in either direction.
Confirm the Direction Comes From Theory, Not From Peeking
A one-tailed test is only valid if the direction was specified in the study design — ideally in a pre-registration — before any data was collected or examined. Looking at a two-tailed result, noticing it's "almost significant" in one direction, and then re-running as one-tailed to halve the p-value is a form of p-hacking and invalidates the stated α.
Write H₀ and Hₐ Formally
Two-tailed: H₀: μ = μ₀, Hₐ: μ ≠ μ₀. Right-tailed: H₀: μ ≤ μ₀, Hₐ: μ > μ₀. Left-tailed: H₀: μ ≥ μ₀, Hₐ: μ < μ₀. Note that for one-tailed tests, H₀ technically includes the "≤" or "≥" — but in practice the test is carried out as if H₀: μ = μ₀, since the boundary case is what determines the critical value.
Place Alpha and Find the Critical Value
Two-tailed at α = 0.05: critical z = ±1.96, or use the z-table for the t-distribution equivalent at your degrees of freedom. One-tailed at α = 0.05: critical z = 1.645 (right) or −1.645 (left). For t-tests, consult the t-distribution table, which typically lists both one-tailed and two-tailed columns for the same df.
Calculate, Compare, and Report Both the Direction and the Decision
Compute the test statistic exactly as you would for any test (see the full hypothesis testing examples for the underlying formulas). Find the matching p-value for the number of tails you specified. If p < α, reject H₀ — and state explicitly in your write-up that the test was one-tailed (and why) or two-tailed, since this materially affects how a reader should interpret the result.
Worked Examples — One-Tailed vs Two-Tailed
The four examples below each show the same underlying scenario type analyzed both ways, so you can see exactly how the choice of tails changes the critical value, the p-value, and sometimes the final decision. Formulas follow the conventions used throughout the statistics and probability section of Statistics Fundamentals, and critical values are cross-checked against the NIST Engineering Statistics Handbook.
Example 1 — Medical Treatment (One-Sample Z-Test)
Problem: A standard medication lowers systolic blood pressure by an average of 10 mmHg, with a known population standard deviation σ = 4 mmHg. A new formulation is tested on n = 36 patients and produces a mean reduction of x̄ = 11.2 mmHg. At α = 0.05, is there evidence the new formulation produces a different — or specifically larger — reduction?
x̄ = sample mean reduction
μ₀ = baseline reduction
σ = known population SD
n = sample size
Test statistic (shared by both versions):
SE = σ/√n = 4/√36 = 4/6 = 0.667
z = (11.2 − 10) / 0.667 = 1.2 / 0.667 = 1.80
Two-tailed version: H₀: μ = 10 | Hₐ: μ ≠ 10. Critical z = ±1.96. p = 2 × P(Z > 1.80) = 2 × 0.0359 = 0.0718.
p = 0.0718 ≥ 0.05 → Fail to Reject H₀ (z = 1.80 < 1.96).
One-tailed (right) version: H₀: μ ≤ 10 | Hₐ: μ > 10 — appropriate only if the research question was specifically "does the new formulation lower blood pressure more than the standard?", decided before the trial. Critical z = 1.645. p = P(Z > 1.80) = 0.0359.
p = 0.0359 < 0.05 → Reject H₀ (z = 1.80 > 1.645).
The same z = 1.80 produces opposite decisions: two-tailed fails to reject H₀ (p = 0.072), one-tailed right rejects it (p = 0.036). This is the textbook case for why the tail choice must be locked in before the trial — choosing one-tailed after seeing a "promising but not quite significant" two-tailed result would be inappropriate.
Example 2 — A/B Campaign Performance (Two-Sample T-Test)
Problem: A marketing team redesigns a checkout page. The old page (Group A, n₁ = 25) averages x̄₁ = 4.10 minutes to complete checkout with s₁ = 1.2. The new page (Group B, n₂ = 25) averages x̄₂ = 3.55 minutes with s₂ = 1.1. At α = 0.05, did checkout time change — and separately, did it specifically decrease?
sₚ² = pooled variance
df = n₁ + n₂ − 2 = 48
Test statistic (shared):
sₚ² = [(24×1.2²) + (24×1.1²)] / 48 = [34.56 + 29.04] / 48 = 1.325
SE = √(1.325 × (1/25 + 1/25)) = √(1.325 × 0.08) = √0.106 = 0.3256
t = (4.10 − 3.55) / 0.3256 = 0.55 / 0.3256 = 1.69 (df = 48)
Two-tailed version: H₀: μ₁ = μ₂ | Hₐ: μ₁ ≠ μ₂. Critical t ≈ ±2.011 (df=48). Two-tailed p ≈ 0.097.
p = 0.097 ≥ 0.05 → Fail to Reject H₀.
One-tailed (left, on x̄₁−x̄₂) version: H₀: μ₁ ≤ μ₂ | Hₐ: μ₁ > μ₂ — testing specifically "does the new page reduce checkout time?", pre-specified before the test ran. Critical t ≈ 1.677 (df=48). One-tailed p ≈ 0.049.
p = 0.049 < 0.05 → Reject H₀.
Same data, two conclusions. The two-tailed test (the safer default for "did anything change?") does not reach significance at α = 0.05. The one-tailed test, framed around the specific directional question "did the redesign make checkout faster?" and decided in advance, narrowly clears the threshold. Teams that run many A/B tests should standardize on two-tailed unless there's a documented, pre-registered reason for a directional hypothesis.
Example 3 — Academic Psychology Experiment (One-Sample T-Test)
Problem: Prior research establishes that, under normal conditions, participants complete a reaction-time task with a mean of μ₀ = 450 ms. A researcher hypothesizes that mild caffeine intake speeds up reaction time. A sample of n = 16 caffeinated participants produces x̄ = 423 ms with s = 48 ms. Test at α = 0.05.
s = sample standard deviation
df = n − 1 = 15
Test statistic:
SE = 48/√16 = 48/4 = 12
t = (423 − 450) / 12 = −27 / 12 = −2.25 (df = 15)
One-tailed (left) version — chosen because the literature specifically predicts a decrease: H₀: μ ≥ 450 | Hₐ: μ < 450. Critical t = −1.753 (df=15, one-tailed α=0.05). p ≈ 0.020.
|t| = 2.25 > 1.753 and p = 0.020 < 0.05 → Reject H₀.
For comparison, two-tailed version: H₀: μ = 450 | Hₐ: μ ≠ 450. Critical t = ±2.131 (df=15). p ≈ 0.040.
p = 0.040 < 0.05 → Reject H₀ as well, since |t| = 2.25 > 2.131.
Both versions reject H₀ here, but notice the margins: the two-tailed p-value (0.040) is close to α, while the one-tailed p-value (0.020) has more headroom. With a pre-registered directional hypothesis backed by prior literature, this is a legitimate case for a one-tailed test — caffeine's effect on reaction time has a well-established direction in the existing research base.
Example 4 — Manufacturing Quality Control (One-Sample Z-Test)
Problem: A bolt is specified to have a diameter of μ₀ = 12.00 mm with a known process standard deviation σ = 0.05 mm. A quality inspector samples n = 49 bolts from a new supplier and measures x̄ = 11.985 mm. At α = 0.05, has the new supplier's process drifted off the 12.00 mm target — and separately, is it producing undersized bolts?
x̄ = sample mean diameter
μ₀ = target diameter
σ = process SD
Test statistic:
SE = 0.05/√49 = 0.05/7 = 0.007143
z = (11.985 − 12.00) / 0.007143 = −0.015 / 0.007143 = −2.10
Two-tailed version (standard QC check — "is the process off-target in either direction?"): H₀: μ = 12.00 | Hₐ: μ ≠ 12.00. Critical z = ±1.96. p = 2 × P(Z > 2.10) = 2 × 0.0179 = 0.0358.
p = 0.0358 < 0.05 → Reject H₀.
One-tailed (left) version — relevant if oversized bolts are harmless but undersized bolts fail a fit check: H₀: μ ≥ 12.00 | Hₐ: μ < 12.00. Critical z = −1.645. p = P(Z < −2.10) = 0.0179.
p = 0.0179 < 0.05 → Reject H₀.
Both reject H₀ in this case, but for different reasons. Two-tailed QC monitoring is the standard approach because both over- and under-sized bolts typically matter. The one-tailed framing would only be defensible if oversized bolts genuinely posed no risk for this specific application — a judgment that belongs in the engineering specification, not in the statistical analysis after the fact.
One vs Two Tailed: Side-by-Side Comparison
Core Differences
| Factor | One-Tailed Test | Two-Tailed Test |
|---|---|---|
| Alternative hypothesis (Hₐ) | μ > μ₀ or μ < μ₀ (directional) | μ ≠ μ₀ (non-directional) |
| Rejection region | All of α in one tail | α/2 in each tail |
| Critical value (z, α=0.05) | 1.645 (right) or −1.645 (left) | ±1.96 |
| p-value (same statistic) | Smaller — half the two-tailed value in the matching direction | Larger — exactly 2 × the one-tailed value |
| Statistical power for predicted direction | Higher | Lower |
| Can detect effect in opposite direction? | No — by design | Yes |
| Direction must be set | Before data collection | Not applicable |
| Typical default in research | Used only with strong prior justification | The standard default |
P-Value and Critical Value Conversion
| Test Statistic | One-Tailed p-value | Two-Tailed p-value |
|---|---|---|
| z = 1.645 | 0.0500 | 0.1000 |
| z = 1.80 | 0.0359 | 0.0718 |
| z = 1.96 | 0.0250 | 0.0500 |
| z = 2.10 | 0.0179 | 0.0358 |
| z = 2.21 | 0.0136 | 0.0272 |
| z = 2.576 | 0.0050 | 0.0100 |
For any test statistic, two-tailed p-value = 2 × one-tailed p-value (in the direction matching the sign of the statistic). This relationship holds for z-tests, t-tests, and most other tests built on symmetric distributions, and is the fastest way to sanity-check software output.
Real-World Applications by Industry
The same one-tailed vs two-tailed decision recurs across fields. The examples below summarize how each industry typically defaults, and why.
Clinical Trials
Regulatory guidance (FDA, EMA) generally expects two-tailed tests for efficacy claims, because a treatment that performs worse than placebo is just as important to detect as one that performs better. One-tailed tests in trials require explicit pre-specification and justification in the protocol.
A/B and Conversion Testing
Most A/B testing platforms default to two-tailed proportion or t-tests, since a redesign could plausibly hurt conversion as easily as help it. One-tailed tests appear mainly in internal "did our specific fix work?" checks where a regression would be caught by other monitoring anyway.
Manufacturing & QC
Two-tailed tests are standard for monitoring whether a process mean has drifted from a target specification in either direction. One-tailed tests appear in specific pass/fail checks — for example, "is the minimum tensile strength above the safety threshold?" — where only one direction constitutes a failure.
Psychology & Social Science
One-tailed tests appear more often here than in medicine, typically when a large body of prior research already establishes a direction (as in the caffeine/reaction-time example above). Pre-registration platforms now require researchers to declare tail direction before data collection.
Finance & Economics
Two-tailed tests dominate when testing whether a regression coefficient differs from zero, since both positive and negative relationships are economically meaningful. One-tailed tests occasionally appear in tests of a specific theoretical prediction, such as "does this policy reduce, not just change, unemployment?"
Machine Learning Evaluation
When comparing a new model to a baseline on the same validation folds with a paired t-test, two-tailed is the safer default — a new architecture can underperform a baseline. One-tailed tests sometimes appear when the new model is a strict superset of the old one (so it cannot, in principle, do worse).
One vs Two Tailed Cheat Sheet
Formula and Hypothesis Summary
| Test Type | H₀ | Hₐ | Rejection Rule |
|---|---|---|---|
| Two-tailed | μ = μ₀ | μ ≠ μ₀ | Reject if |stat| > critical value |
| Right-tailed | μ ≤ μ₀ | μ > μ₀ | Reject if stat > +critical value |
| Left-tailed | μ ≥ μ₀ | μ < μ₀ | Reject if stat < −critical value |
Critical Values at Common Alpha Levels (z-distribution)
| α | Two-Tailed Critical z | One-Tailed Critical z |
|---|---|---|
| 0.10 | ±1.645 | 1.282 |
| 0.05 | ±1.960 | 1.645 |
| 0.01 | ±2.576 | 2.326 |
For t-distributions, the equivalent critical values depend on degrees of freedom — the full t-distribution table lists both one-tailed and two-tailed columns side by side for each df, which is the fastest way to look up the exact value for your sample size.
Interactive One vs Two Tailed Calculator
Enter your sample statistics once, then compare how the p-value and decision change across two-tailed, right-tailed, and left-tailed framings of the same data. This reuses the same z-test and t-test engine as the main hypothesis testing examples calculator.
🔬 One-Sample Z-Test / T-Test — Tail Comparison
Common Mistakes
- Choosing tails after seeing the data: If a two-tailed result is "close" and switching to one-tailed would push it under α, that switch is not valid — the direction must be decided before data collection, full stop.
- Treating a non-significant two-tailed result as proof of "no effect": A one-tailed test in the correct direction might still be significant if it was genuinely justified in advance — but retroactively justifying it is the problem above, not a solution.
- Forgetting that a one-tailed test is blind to the opposite direction: A right-tailed test that produces z = −3.5 still fails to reject H₀, even though −3.5 looks extreme. The test was never designed to detect decreases.
- Assuming "directional theory" means "any direction I'd like to find": Genuine directional justification comes from established prior research, mechanism, or a situation where one direction is the only one that matters operationally — not from a researcher's hope about which way the data will go.
- Reporting only the p-value without stating the tail count: p = 0.04 means something different depending on whether it's one-tailed or two-tailed. Always state which was used and why.
Frequently Asked Questions
A one-tailed test checks for an effect in a single, predetermined direction (greater than or less than) and puts all of α in that one tail. A two-tailed test checks for a difference in either direction and splits α across both tails (α/2 each). One-tailed tests are more powerful for the predicted direction; two-tailed tests are the standard default in most research.
Use a one-tailed test only when the direction of the expected effect is specified before data collection, based on theory, prior studies, or a situation where a result in the opposite direction would be treated identically to "no effect" at all. The direction must be chosen before seeing the results — choosing it afterward invalidates the test.
Yes, for an effect in the predicted direction. Because all of α sits in one tail rather than being split, the critical value is less extreme (for example, z = 1.645 instead of 1.96 at α = 0.05), making it easier to reach significance when the effect goes the predicted way. The trade-off is that a one-tailed test in the wrong direction cannot reject H₀ at all, no matter how extreme the result.
No. Choosing the tail type after looking at the data is a form of p-hacking that inflates the true Type I error rate above the stated α. The number of tails must be specified in the hypotheses before any data is collected or analyzed — ideally documented in a pre-registration.
Direction determines two things: where the rejection region sits on the distribution, and what the p-value means. A directional (one-tailed) Hₐ concentrates α in one tail and produces a smaller p-value for an effect in that direction, but zero ability to detect an effect in the other direction. A non-directional (two-tailed) Hₐ splits α and can detect effects either way, at the cost of needing a more extreme result to reach the same α.
Example: a quality control check on bolt diameter, H₀: μ = 12.00 mm, Hₐ: μ ≠ 12.00 mm. Both oversized and undersized bolts are problems, so the test must be able to flag a drift in either direction. This is Example 4 above, worked in full with z = −2.10 and a two-tailed p-value of 0.0358.
A one-tailed hypothesis states the alternative as strictly greater than (Hₐ: μ > μ₀, right-tailed) or strictly less than (Hₐ: μ < μ₀, left-tailed) the null value, never both. Example: H₀: μ ≤ 30 minutes, Hₐ: μ > 30 minutes, for testing whether a new packing process increases average delivery time compared to a 30-minute baseline. This is right-tailed because the alternative hypothesis only claims an increase.
For the same test statistic, the two-tailed p-value is exactly twice the one-tailed p-value in the corresponding direction. For example, if the one-tailed p-value for z = 2.21 is 0.0136, the two-tailed p-value is 2 × 0.0136 = 0.0272. This is why a result can be significant one-tailed but not two-tailed at the same α — see the conversion table above for more values.
Sources and References
This guide draws on the following primary and secondary sources. Critical values and formulas are cross-referenced against NIST's Engineering Statistics Handbook, the most widely cited government reference for applied statistics.
- NIST Engineering Statistics Handbook — Hypothesis Testing. National Institute of Standards and Technology. itl.nist.gov
- Penn State STAT 415 — Introduction to Mathematical Statistics. Penn State Eberly College of Science. online.stat.psu.edu
- OpenStax Introductory Statistics — Ch. 9–10: Hypothesis Testing. Rice University. openstax.org
- Stanford Encyclopedia of Philosophy — Significance Tests entries discuss the Fisher and Neyman–Pearson traditions underlying directional vs non-directional testing. plato.stanford.edu
- Neyman, J. & Pearson, E.S. (1933) — "On the Problem of the Most Efficient Tests of Statistical Hypotheses." Philosophical Transactions of the Royal Society A, 231, 289–337.
- Fisher, R.A. (1925) — Statistical Methods for Research Workers. Edinburgh: Oliver and Boyd.