BY: Statistics Fundamentals Team
Reviewed By: Minsa A (Senior Statistics Editor)

Free Z-Test Calculator

Run a complete hypothesis test in seconds. This z-test calculator handles one-sample, two-sample, and proportion tests — enter your data and instantly get the Z-statistic, P-value, critical value, the statistical decision, and a full step-by-step solution, all in your browser with no signup required.

Z-Test Calculator

Formula Z = (x̄ − μ₀) / (σ / √n) Use when σ is known, n ≥ 30
Formula Z = (x̄₁ − x̄₂) / √(σ²₁/n₁ + σ²₂/n₂) Use when both σ known
Formula Z = (p̂ − p₀) / √(p₀(1−p₀)/n) Use for Conversion rates, polls, defect rates
Number of successes in sample

Run a calculation in any of the three tabs first, then return here to see the full step-by-step solution.

No data yet — enter values in the One-Sample Mean, Two-Sample Mean, or Proportion tab first.

What Is a Z-Test?

A z-test is a statistical hypothesis test used to determine whether a significant difference exists between a sample statistic and a known population parameter, or between two sample statistics, when the population standard deviation is known and the sample size is large (n ≥ 30). It relies on the standard normal (Z) distribution to convert a sample result into a Z-statistic, then maps that statistic to a p-value to decide whether the observed difference is too large to attribute to random chance.

Z-tests appear throughout research methodology, manufacturing quality control, and digital analytics because they give a precise, reproducible answer to a simple question: is this difference real, or could it have happened by chance? According to Penn State's STAT 415 course materials, the z-test is one of the foundational procedures in classical hypothesis testing, built directly on properties of the sampling distribution of the mean.

The Z-Test Formula Library

There are three core z-test formulas — one for a single sample mean, one for comparing two sample means, and one for a single proportion. Each compares an observed statistic against a hypothesized value, scaled by the standard error of that statistic.

One-Sample Z-Test for Means

Z = (x̄ − μ₀) / (σ / √n) Where: x̄ = sample mean μ₀ = hypothesized population mean σ = population standard deviation n = sample size

Two-Sample Z-Test for Means

Z = [(x̄₁ − x̄₂) − (μ₁ − μ₂)] / SE SE = √(σ²₁/n₁ + σ²₂/n₂) Where: x̄₁, x̄₂ = sample means σ₁, σ₂ = population std. deviations n₁, n₂ = sample sizes

One-Proportion Z-Test

Z = (p̂ − p₀) / √(p₀(1−p₀) / n) Where: p̂ = sample proportion (x / n) p₀ = hypothesized population proportion n = sample size

Standard Error & P-Value

One-sample mean: SE = σ / √n Two-sample mean: SE = √(σ²₁/n₁+σ²₂/n₂) Proportion: SE = √(p₀(1−p₀)/n) Two-tailed p-value: P = 2 × (1 − Φ(|Z|))

In plain English: the Z-statistic measures how many standard errors the observed result sits away from the hypothesized value. A larger absolute Z means the gap between what you observed and what the null hypothesis predicted is statistically less likely to be random noise. MIT OpenCourseWare's 18.650 Statistics for Applications covers this derivation as the basis for classical test statistics built on the Central Limit Theorem.

Common Significance Levels and Critical Z-Values

The most commonly used significance level is α = 0.05, corresponding to a critical z-value of ±1.96 for a two-tailed test. A lower alpha demands a larger Z-statistic before the null hypothesis can be rejected, reducing the chance of a false positive.

Table: Significance Levels, Alpha, and Critical Z-Values

Confidence LevelAlpha (α)Left-Tailed Critical ValueRight-Tailed Critical ValueTwo-Tailed Critical Values
90%0.10−1.282+1.282±1.645
95%0.05−1.645+1.645±1.960
99%0.01−2.326+2.326±2.576
99.9%0.001−3.090+3.090±3.291
90%
z* = 1.645
α = 0.10
95%
z* = 1.960
α = 0.05 (standard)
99%
z* = 2.576
α = 0.01

How to Perform a Z-Test — Step by Step

To perform a one-sample z-test: state the null and alternative hypotheses, choose a significance level, calculate the standard error, compute the Z-statistic, find the corresponding p-value, and compare it against alpha to make a decision. Here is the complete method with a worked example.

1
Formulate the null and alternative hypotheses

State H₀ (no difference) and Hₙ (a difference exists). For example: a factory claims its bottling line fills bottles to a mean of 500 ml. H₀: μ = 500. Hₙ: μ ≠ 500 (two-tailed).

2
Choose a significance level (α)

Select α = 0.05, the conventional threshold. This is the probability of rejecting H₀ when it is actually true (a Type I error).

3
Calculate the standard error (SE)

SE = σ / √n = 6 / √50 = 6 / 7.0711 = 0.8485. The standard error measures the expected variability of the sample mean around the true population mean.

4
Compute the Z-statistic

Z = (x̄ − μ₀) / SE = (502 − 500) / 0.8485 = 2.357. This value tells you how many standard errors the sample mean lies from the hypothesized mean.

5
Find the p-value or critical value

For a two-tailed test: P = 2 × (1 − Φ(2.357)) = 2 × (1 − 0.9908) = 0.0184. Alternatively, compare Z = 2.357 to the critical value z* = 1.96.

6
Make the statistical decision

Since P = 0.0184 ≤ α = 0.05 (and |Z| = 2.357 > 1.96), reject H₀. The evidence suggests the true mean fill volume differs from 500 ml.

Result: x̄ = 502, σ = 6, n = 50, SE = 0.8485, Z = 2.357, P = 0.0184, decision = reject H₀ at α = 0.05. You can verify this result using the One-Sample Mean tab of the calculator above.

🧠 The SCALE Framework: Reading a Z-Test Without Heavy Math

The SCALE Framework is a structured memory device for the five elements of any z-test. It is built for students, analysts, and researchers who need to set up or interpret a z-test correctly without re-deriving the theory each time.

S
Statistic
Your observed sample value — a mean, a difference of means, or a proportion. This is what you measured.
C
Comparison Value
The hypothesized population value (μ₀ or p₀) that the null hypothesis claims is true.
A
Alpha
The significance level you choose before testing — typically 0.05. It sets how much evidence is required to reject H₀.
L
Location (Z)
The Z-statistic places your result on the standard normal curve — how many standard errors away from the comparison value it falls.
E
Evidence (P-value)
The probability of observing a result this extreme if H₀ were true. Small p-value = strong evidence against H₀.
Intuitive Analogy: Think of a z-test like a metal detector at airport security. The detector (the test) sets a threshold (alpha) for how strong a signal (Z-statistic) must be before it sounds an alarm (rejects H₀). Set the threshold too low and you get false alarms (Type I errors); set it too high and you miss real threats (Type II errors). The p-value is the strength of the signal that triggered — or failed to trigger — the alarm.

📊 Worked Case Studies

Example 1 — One-Sample Mean Z-Test (Manufacturing QC)

Scenario: A cereal manufacturer claims its boxes contain a mean of 500g, with a known population standard deviation of 6g (from historical process data). Quality control samples 50 boxes and finds a mean of 502g. Test at α = 0.05 whether the true mean differs from 500g.
Hypotheses

H₀: μ = 500, Hₙ: μ ≠ 500 (two-tailed)

Standard error

SE = 6 / √50 = 6 / 7.0711 = 0.8485

Z-statistic

Z = (502 − 500) / 0.8485 = 2.357

P-value & decision

P = 2 × (1 − Φ(2.357)) = 0.0184 ≤ 0.05 → reject H₀

Interpretation: The evidence indicates the true mean fill weight is statistically different from the claimed 500g. The process may need recalibration. This is a standard application of z-tests in Six Sigma quality control programs.

Example 2 — Two-Sample Mean Z-Test (Healthcare Research)

Scenario: Researchers compare resting heart rate between two large, well-characterized population groups with known standard deviations. Group A: n₁ = 45, x̄₁ = 85.2 bpm, σ₁ = 9.5. Group B: n₂ = 50, x̄₂ = 81.0 bpm, σ₂ = 10.2. Test at α = 0.05 whether the means differ.
Mean difference

x̄₁ − x̄₂ = 85.2 − 81.0 = 4.2

Standard error

SE = √(9.5²/45 + 10.2²/50) = √(2.0056 + 2.0808) = √4.0864 = 2.0215

Z-statistic

Z = 4.2 / 2.0215 = 2.078

P-value & decision

P = 2 × (1 − Φ(2.078)) = 0.0377 ≤ 0.05 → reject H₀

Interpretation: Group A shows a statistically significantly higher mean resting heart rate than Group B at the 5% level. This type of two-sample z-test underpins comparative analysis in epidemiology when population variances are well established from prior research, per WHO guidelines on reference intervals.

Example 3 — One-Proportion Z-Test (Business A/B Testing)

Scenario: A company's historical checkout-page conversion rate is 12%. After a redesign, 58 of 400 visitors convert (14.5%). Test at α = 0.05 whether the new conversion rate is significantly different from the historical 12%.
Sample proportion

p̂ = 58 / 400 = 0.145

Standard error

SE = √(0.12 × 0.88 / 400) = √0.000264 = 0.01625

Z-statistic

Z = (0.145 − 0.12) / 0.01625 = 1.538

P-value & decision

P = 2 × (1 − Φ(1.538)) = 0.1241 > 0.05 → fail to reject H₀

Although the observed conversion rate rose from 12% to 14.5%, the sample size is not large enough to call the increase statistically significant at α = 0.05. The team would need a larger sample, or a longer test window, before rolling the redesign out fully. This is the same reasoning growth and product teams apply in conversion rate optimization testing.

📊 How Sample Size Affects Z-Test Power — Benchmark Dataset

A z-test's ability to detect a true effect — its statistical power — depends heavily on sample size. The table below shows the minimum detectable difference in means (holding σ = 10, α = 0.05, power = 80% constant) as sample size increases. This follows directly from the standard error formula SE = σ/√n: doubling n shrinks SE by a factor of √2 ≈ 1.41, not by half.

Table: Minimum Detectable Effect vs. Sample Size (σ = 10, α = 0.05, Power = 80%) — Reference Benchmark

Sample Size (n)SE (σ/√n)Min. Detectable Diff. (One-Tailed)Min. Detectable Diff. (Two-Tailed)
301.8264.625.27
501.4143.584.08
1001.0002.532.88
2500.6321.601.82
5000.4471.131.29
1,0000.3160.800.91

Key takeaway: to detect an effect half as large, you need roughly four times the sample size. This is a fundamental constraint in research design and experiment planning. The U.S. Census Bureau's American Community Survey methodology extensively documents this trade-off between sample size and detectable precision in large-scale survey design.

Z-Test vs. T-Test vs. Other Statistical Models

Use a z-test when the population standard deviation is known and the sample size is large (n ≥ 30). Use a t-test when the population standard deviation is unknown — which is the more common real-world scenario — since it correctly accounts for the extra uncertainty of estimating variance from the sample. For categorical counts, use a chi-square test; for comparing three or more group means, use ANOVA.

Table: Z-Test vs. T-Test vs. Chi-Square vs. ANOVA

FeatureZ-TestT-TestChi-Square TestANOVA
Primary MetricMeans / ProportionsMeansCategorical CountsMultiple Means (≥ 3)
Sample Size RuleLarge (n ≥ 30)Small (n < 30) or largeVariableVariable
Population σMust be knownUnknown (uses s)N/AUnknown / estimated
DistributionStandard Normal (Z)Student's tχ²-DistributionF-Distribution

Table: Z-Test vs. T-Test — Decision Criteria

ConditionZ-TestT-Test
Population SD (σ) known✓ Use Z
Population SD unknown, n ≥ 30✓ Approx. OK✓ Preferred
Population SD unknown, n < 30✓ Required
Proportion data (binary outcome)✓ Always Z
Comparing 3+ group means— (use ANOVA)

In applied research, a known population standard deviation is rare outside of well-established industrial processes, standardized test scores, or very large reference datasets. For nearly all sample-based research, the t-test is the statistically correct default. When n grows large (typically n ≥ 30), the t-distribution converges toward the standard normal distribution, and z-test and t-test results become nearly identical.

Z-Test Glossary — Key Terms and Formulas

TermFormulaInterpretation
Z-Statistic Z = (x̄ − μ) / SE The standard-deviation distance of the sample result from the hypothesized value. Higher absolute values indicate a more extreme, less likely-by-chance result.
P-Value P(Z ≥ |z|) The probability of observing a result this extreme if the null hypothesis were true. If P ≤ α, the result is statistically significant.
Critical Value Zα or Zα/2 The boundary of the rejection region for a given alpha. A Z-statistic that crosses this boundary triggers rejection of H₀.
Standard Error SE = σ / √n The expected dispersion of the sample statistic around the true population value. Smaller SE means a more precise estimate.
Significance Level (α) User-defined, typically 0.05 The pre-set probability threshold for a Type I error (false positive) — the risk you accept of rejecting a true null hypothesis.
Null Hypothesis (H₀) μ = μ₀ (or p = p₀) The default claim of no difference or no effect, which the z-test attempts to find evidence against.
Alternative Hypothesis (Hₙ) μ ≠ μ₀ (< or >) The claim that a real difference or effect exists, accepted only when H₀ is rejected.

How to Interpret a Z-Test Result (and Common Pitfalls)

A statistically significant z-test result means the observed data would be unlikely under the null hypothesis — it does not, by itself, mean the effect is large or practically important. Interpreting a z-test correctly means looking past the reject/fail-to-reject label and considering effect size and context.

✗ Common pitfall: Treating unknown sample variance as known population variance without adjustment, then applying a z-test instead of a t-test.

✓ Correct approach: Use a z-test only when σ is genuinely known from an independent, established source — not estimated from the current sample. Otherwise, use a t-test.

Other common pitfalls include ignoring skewed distribution shapes in small samples (the normal approximation underlying the z-test weakens when data is heavily skewed and n is small), and confusing statistical significance with practical or economic significance — a tiny, practically meaningless difference can still produce a small p-value if the sample is large enough. As documented in The American Statistician (2016), a peer-reviewed journal of the American Statistical Association, misinterpretation of p-values and significance tests is among the most pervasive errors in applied statistics, affecting published research across psychology, medicine, and economics.

Z-Tests and Confidence Intervals — The Connection

A two-tailed z-test at α = 0.05 and a 95% confidence interval are mathematically equivalent: if the hypothesized value falls outside the 95% confidence interval built from the sample, the z-test rejects H₀ at the 5% significance level. This duality means a confidence interval can answer the same question a z-test answers, while also showing the plausible range of the true effect.

Practical rule: If the 95% confidence interval for a mean or proportion does not contain the hypothesized value (μ₀ or p₀), the z-test rejects H₀ at α = 0.05. If the interval does contain it, you fail to reject H₀. This principle connects directly to the confidence interval calculator on this site.

For a deeper treatment of how hypothesis tests and intervals relate, see the hypothesis testing guide on Statistics Fundamentals, which covers the formal duality between interval estimation and significance testing across one-sample and two-sample procedures.

Z-Test in Python — Code Reference

For developers and analysts who prefer to run the calculation in code, here is a minimal one-sample z-test implementation using scipy.stats. This mirrors exactly what the calculator above computes.

import scipy.stats as stats import math def one_sample_z_test(x_bar, mu0, sigma, n, alpha=0.05, tail='two-sided'): # Standard error se = sigma / math.sqrt(n) # Z-statistic z_stat = (x_bar - mu0) / se # P-value based on tail type if tail == 'two-sided': p_val = 2 * (1 - stats.norm.cdf(abs(z_stat))) elif tail == 'less': p_val = stats.norm.cdf(z_stat) else: # 'greater' p_val = 1 - stats.norm.cdf(z_stat) return { "z_statistic": round(z_stat, 4), "p_value": round(p_val, 4), "reject_h0": p_val <= alpha } # Example: manufacturing QC case study above result = one_sample_z_test(x_bar=502, mu0=500, sigma=6, n=50) print(result) # {'z_statistic': 2.357, 'p_value': 0.0184, 'reject_h0': True}

Related Topics and Calculators on Statistics Fundamentals

Z-tests connect to nearly every area of inferential statistics. These resources build the complete picture.

Sources and Further Reading

Authority sources cited in this guide:

  • Penn State STAT 415. Introduction to Mathematical Statistics — Interval Estimation. online.stat.psu.edu
  • MIT OpenCourseWare. 18.650 Statistics for Applications, Fall 2016. ocw.mit.edu
  • Wasserstein, R.L. & Lazar, N.A. (2016). “The ASA Statement on p-Values.” The American Statistician. tandfonline.com
  • U.S. Census Bureau. American Community Survey Design and Methodology. census.gov
  • World Health Organization. Reference Intervals and Decision Limits. who.int
  • NIST. Engineering Statistics Handbook — Hypothesis Testing. itl.nist.gov
  • OpenStax. Introductory Statistics, Chapter 9: Hypothesis Testing. openstax.org
  • Devore, J.L. Probability and Statistics for Engineering and the Sciences, 9th ed. Cengage Learning, 2016.

Frequently Asked Questions

A z-test is a parametric statistical test used to determine whether there is a significant difference between a sample statistic and a population parameter, or between two sample statistics, when the population standard deviation is known and the sample size is large (typically n ≥ 30). It relies on the standard normal (Z) distribution to compute a p-value and reach a decision about the null hypothesis.

To calculate a one-sample z-test: (1) subtract the hypothesized population mean from the sample mean; (2) divide by the standard error SE = σ/√n; (3) compare the resulting Z-statistic to a critical value, or convert it to a p-value. For example, with x̄ = 502, μ₀ = 500, σ = 6, n = 50: SE = 6/√50 = 0.8485, Z = (502−500)/0.8485 = 2.357. Use the calculator above to verify any calculation instantly.

A z-test calculator is an online tool that performs hypothesis testing on population means or proportions, automatically computing the Z-statistic, p-value, critical value, and the reject/fail-to-reject decision from your input data. It removes the need to manually look up critical values in a z-table or compute the normal cumulative distribution function by hand.

Use a z-test when the population standard deviation is known and the sample size is large (n ≥ 30), or when testing a proportion. Common applications include manufacturing quality control with established process variability, large-scale survey and polling analysis, standardized test score comparisons, and A/B testing with sufficiently large sample sizes.

The one-sample z-test formula is Z = (x̄ − μ₀) / (σ/√n). The two-sample version is Z = (x̄₁ − x̄₂) / √(σ²₁/n₁ + σ²₂/n₂). The one-proportion version is Z = (p̂ − p₀) / √(p₀(1−p₀)/n). All three formulas measure how many standard errors the observed statistic sits from the hypothesized value.

Compare the p-value to your chosen significance level (α). If P ≤ α, reject the null hypothesis — the evidence suggests a real difference exists. If P > α, fail to reject the null hypothesis — the data does not provide enough evidence of a difference. Always pair this decision with the effect size and context, since statistical significance does not automatically imply practical importance.

A z-test requires the population standard deviation to be known and is typically used with large samples (n ≥ 30), relying on the standard normal distribution. A t-test is used when the population standard deviation is unknown and is instead estimated from the sample, relying on the t-distribution, which has heavier tails to account for the added uncertainty — this distinction matters most when sample sizes are small.

A Z-statistic (or Z-score in a hypothesis-testing context) is the number of standard errors a sample result lies from the value specified by the null hypothesis. It is computed as the difference between the observed and hypothesized values, divided by the standard error. Larger absolute Z-statistics correspond to smaller p-values and stronger evidence against the null hypothesis.

For a two-tailed test, the p-value equals 2 × (1 − Φ(|Z|)), where Φ is the cumulative distribution function of the standard normal distribution. For a right-tailed test, P = 1 − Φ(Z). For a left-tailed test, P = Φ(Z). Statistical software and z-tables provide Φ(Z) directly; the calculator above computes this automatically for all three tail types.

A two-tailed z-test checks whether a sample statistic differs from the hypothesized value in either direction (Hₙ: μ ≠ μ₀). It splits the significance level across both tails of the distribution, so the critical region is divided into two halves — for example, ±1.96 at α = 0.05. Use a two-tailed test whenever you have no prior directional expectation about the difference.

A one-tailed z-test checks whether a sample statistic is specifically greater than (right-tailed) or less than (left-tailed) the hypothesized value, concentrating the entire significance level in one tail of the distribution. This makes a one-tailed test more powerful for detecting an effect in the specified direction, but it cannot detect an effect in the opposite direction. Use it only when a directional hypothesis is justified before collecting data.

The null hypothesis (H₀) in a z-test is a statement of no effect or no difference — for example, that a sample mean equals a specific population value, or that two population means or proportions are equal. The z-test calculates how likely the observed data would be if this null hypothesis were true, and rejects it only when that likelihood (the p-value) falls below the chosen significance level.

Statistical significance means the observed result is unlikely to have occurred by random chance alone, given the null hypothesis is true — formally, that the p-value is less than or equal to the chosen significance level (α). It is a statement about the strength of evidence against H₀, not a statement about how large or practically meaningful the effect is.

For a one-proportion z-test: compute the sample proportion p̂ = x/n, calculate the standard error SE = √(p₀(1−p₀)/n) using the hypothesized proportion p₀, then Z = (p̂ − p₀) / SE. This method requires np₀ ≥ 5 and n(1−p₀) ≥ 5 for the normal approximation to be valid. The Proportion tab of the calculator above performs this automatically.

For a one-sample mean z-test, calculate SE = σ/√n, then Z = (x̄ − μ₀)/SE. For a two-sample mean z-test, calculate the pooled standard error SE = √(σ²₁/n₁ + σ²₂/n₂), then Z = (x̄₁ − x̄₂)/SE. Both require the relevant population standard deviation(s) to be known in advance.

A critical z-value is the boundary of the rejection region on the standard normal distribution for a given significance level. For a two-tailed test at α = 0.05, the critical values are ±1.96. If the computed Z-statistic falls beyond this boundary, the null hypothesis is rejected. Critical values grow larger (more extreme) as the significance level decreases.

The rejection region is the set of Z-statistic values extreme enough to lead to rejecting the null hypothesis at a chosen significance level. For a two-tailed test at α = 0.05, the rejection region is Z < −1.96 or Z > 1.96. For one-tailed tests, the entire rejection region sits in a single tail of the distribution.

A valid z-test assumes: (1) the population standard deviation is known; (2) the sample is randomly selected and observations are independent; (3) the sampling distribution of the statistic is approximately normal — satisfied either because the underlying population is normal, or because the sample size is large enough (n ≥ 30) for the Central Limit Theorem to apply; and, for proportion tests, that np₀ and n(1−p₀) are both at least 5.

A z-test is technically valid for small samples only if the population standard deviation is known and the underlying data is approximately normally distributed. In practice, when the population standard deviation is unknown — which is most real-world cases — a t-test is the statistically appropriate choice for small samples (n < 30), since it accounts for the extra uncertainty of estimating variance from the sample. See the One-Sample T-Test guide instead.

Researchers use z-tests to compare sample data against established benchmarks or to compare two large, well-characterized groups when population variability is already known from prior studies or standardized processes. Typical applications span manufacturing quality control, large-scale public health surveillance, standardized educational testing, and election polling, where historical population parameters provide the known σ the z-test requires.