Sample Size Calculator
For surveys and research measuring proportions — yes/no questions, approval ratings, conversion rates, and binary outcomes.
For experiments and studies measuring a continuous variable — test scores, weights, times, revenue — where you know or can estimate the population standard deviation.
What Is a Sample Size Calculator?
A sample size calculator determines the minimum number of participants, survey responses, or observations needed to produce statistically reliable results based on population size, confidence level, margin of error, and expected variability. Rather than measuring every member of a population, researchers take a representative subset — and sample size calculation tells them exactly how large that subset must be before the results can be trusted.
The practical question is always the same: how many data points do you need before a finding reflects reality rather than chance? Too few and your estimate could be wildly off. Too many and you waste resources. The formula converts your acceptable error tolerance and desired certainty into a concrete number. According to the American Association for Public Opinion Research (AAPOR), disclosure of sample size and methodology is a core standard for any published survey result.
How Sample Size Connects to Statistical Inference
Every sample size formula is built on two things: the Z-score for your confidence level and the variability in your data. A confidence level of 95% means that if you ran the same study 100 times, 95 of those intervals would contain the true population value. The margin of error defines how wide that interval can be. Variability — either an estimated proportion p or a standard deviation σ — controls how spread out individual responses are. More variability means you need more data to pin down the true value reliably.
This is also why using p = 0.5 (50%) for surveys is the conservative default: a 50/50 split has the highest possible variance for a proportion, so any actual distribution will require an equal or smaller sample. You are building in a safety margin.
How Sample Size Is Calculated
Three inputs drive the core formula: the Z-score, the expected proportion (or standard deviation), and the margin of error. Once you have the raw sample size, you apply a finite population correction if your population is small.
Formula for Proportions
This formula applies when you are measuring a proportion — approval rates, conversion rates, yes/no questions, or any binary outcome.
Standard Formula
n = Z² × p(1−p) / e²
Z = Z-score for confidence level
p = expected proportion (use 0.5)
e = margin of error (as decimal)
Example: 95% conf, ±5% MOE, p=0.5
n = (1.96)² × 0.5(0.5) / (0.05)²
n = 3.8416 × 0.25 / 0.0025
n = 384.16 → round up to 385
Finite Population Correction
n_adj = n / (1 + (n−1)/N)
N = total population size
n = raw sample size from formula
Example: N=1,000, n=385
n_adj = 385 / (1 + 384/1000)
n_adj = 385 / 1.384
n_adj ≈ 278
Saves 107 responses vs. assuming
an infinite population.
Formula for Means
When your outcome is a continuous measurement — heights, test scores, response times, revenue per customer — use the means formula with the population standard deviation.
Where σ is the standard deviation and e is the acceptable margin of error in the same units.
Example: IQ scores (σ = 15), want estimate within ±3 points, 95% confidence.
n = (1.96 × 15 / 3)² = (9.8)² = 96.04 → round up to 97 participants.
Response Rate Adjustment
The sample size formula gives you the number of completed responses you need. If your expected response rate is 40%, you must invite 2.5× more people than your target sample. Divide your required sample by the expected response rate: 385 ÷ 0.40 = 963 invitations to reach 385 responses.
Industry Sample Size Benchmarks
Different research contexts operate with different norms. The table below reflects minimum thresholds commonly cited across disciplines, not upper limits. Larger samples always increase precision.
| Research Type | Typical n | Standard Parameters | Notes |
|---|---|---|---|
| Survey Research | 385 | 95% conf, ±5% MOE, p=0.5 | Applies to any large or unknown population |
| Market Research | 384–600 | 95% conf, ±4–5% MOE | Segment analysis may require larger sub-samples |
| Academic Thesis | 50–500 | 80% power, α=0.05 | Power analysis required; effect size drives n |
| Clinical Research | Phase II: 100–300 | 80–90% power, two-sided α=0.05 | Regulatory bodies require power analysis documentation |
| UX Research | 5 (qualitative) | No formula; diminishing returns | Nielsen Norman Group: 5 users find ~85% of usability issues |
| A/B Testing | 1,000+ per variant | 80% power, α=0.05 | Minimum detectable effect determines required n |
The SAMPLE Framework
The SAMPLE framework is a structured decision process for determining the right sample size before any data is collected. It prevents the two most common planning errors: choosing an arbitrary round number or borrowing a sample size from an unrelated study without checking whether the parameters match.
Confidence Level Explained
The confidence level tells you how often the calculated interval would contain the true population value if you repeated the study under identical conditions. It does not mean "95% probability that the true value is in this interval" — the true value either is or is not in a given interval. The 95% refers to the long-run behavior of the method across many repetitions.
| Confidence Level | Z-Score | Required n (p=0.5, ±5% MOE) | When to Use |
|---|---|---|---|
| 90% | 1.645 | 271 | Budget-constrained studies; preliminary research; low-stakes decisions |
| 95% | 1.960 | 385 | Standard for academic research, market surveys, and published reports |
| 99% | 2.576 | 664 | Clinical trials, legal findings, regulatory submissions, high-stakes decisions |
Moving from 95% to 99% confidence increases the required sample by 72% (from 385 to 664). Moving from ±5% to ±3% margin of error at 95% confidence more than doubles the requirement to 1,068. These are the two levers with the largest effect on sample size.
Margin of Error Explained
The margin of error (±e) defines how far your sample estimate can deviate from the true population value while still being within the confidence interval. A poll showing 52% support with ±3% margin of error means the true value likely falls between 49% and 55% — which has very different implications than if it fell between 51% and 53%.
| Margin of Error | Required n (95% conf, p=0.5) | Interpretation | Typical Use |
|---|---|---|---|
| ±10% | 97 | Wide — results directional only | Early pilots, feasibility checks |
| ±5% | 385 | Standard precision for most surveys | Market research, customer feedback |
| ±3% | 1,068 | High precision; distinguishes close results | Political polling, product decisions |
| ±1% | 9,604 | Very high — rarely practical without large budget | National census supplements, regulatory studies |
Population Size and Finite Population Correction
For small populations, the finite population correction (FPC) reduces the required sample size because you are measuring a meaningful fraction of the total group. When your sample is 5% or more of the total population, the FPC starts making a meaningful difference.
Comparison: You are surveying employees at a company of 500 people. Standard parameters: 95% confidence, ±5% margin of error, p = 0.5.
n = (1.96)² × 0.5 × 0.5 / (0.05)² = 3.8416 × 0.25 / 0.0025 = 385
nₐₑₐ = 385 / (1 + (385 − 1) / 500) = 385 / (1 + 0.768) = 385 / 1.768 ≈ 218
You need only 218 responses instead of 385 — a reduction of 167 participants (43%) — because the population is small enough that each response carries more statistical weight.
The FPC matters whenever your sample would exceed about 5% of the total population. For large populations above 100,000, the correction is negligible and can be ignored safely.
Statistical Power and Effect Size
Statistical power is the probability that a study correctly detects a real effect when one exists. For experiments and clinical research, calculating power is just as important as calculating sample size — they are two sides of the same coin. Standard practice targets 80% power, meaning there is a 20% chance of missing a real effect (Type II error).
| Concept | Definition | Symbol | Typical Target |
|---|---|---|---|
| Statistical Power | Probability of detecting a real effect | 1 − β | ≥ 80% |
| Type I Error (False Positive) | Rejecting H₀ when it is true | α | 0.05 |
| Type II Error (False Negative) | Failing to detect a real effect | β | ≤ 0.20 |
| Effect Size | Magnitude of the difference you want to detect | d (Cohen's d) | Small: 0.2 / Medium: 0.5 / Large: 0.8 |
A smaller effect size requires a larger sample to detect with sufficient power. A clinical trial testing whether a drug reduces blood pressure by 2 mmHg needs far more participants than one testing for a 10 mmHg reduction. For power analysis in R, use pwr.t.test() from the pwr package. For the National Institutes of Health, power analysis is a required component of grant applications involving human subjects research.
Worked Case Studies
Case Study 1: Customer Satisfaction Survey
Scenario: A retailer with 8,000 loyalty customers wants to measure satisfaction. They expect roughly 70% satisfaction and want results within ±4%, at 95% confidence.
Z = 1.96
n = (1.96)² × 0.70 × 0.30 / (0.04)² = 3.8416 × 0.21 / 0.0016 = 504.21 → 505
nₐₑₐ = 505 / (1 + 504/8000) = 505 / 1.063 ≈ 475
Invitations needed = 475 / 0.40 = 1,188 contacts
Required sample: 475 responses. The known prior estimate of 70% (rather than 50%) reduces the sample requirement compared to using the conservative default.
Case Study 2: Academic Thesis (Continuous Outcome)
Scenario: A graduate student is comparing exam scores between two teaching methods. Prior literature suggests σ ≈ 12 points. The student wants to detect a difference of 4 points with 95% confidence and 80% power.
n = (Zα/2 + Zβ)² × 2σ² / Δ² where Zα/2 = 1.96, Zβ = 0.842 (80% power)
n = (1.96 + 0.842)² × 2 × 144 / 16 = 7.854 × 288 / 16 = 141 per group
141 × 2 = 282 students total, split evenly across both teaching conditions
A two-sample study with 80% power requires 282 participants total. Increasing power to 90% (Zβ = 1.282) would require 378 participants.
Case Study 3: A/B Testing for Conversion Rate
Scenario: An e-commerce site has a 3% checkout conversion rate and wants to detect a 0.5 percentage point improvement (from 3% to 3.5%) with 95% confidence and 80% power.
p̄ = (0.03 + 0.035)/2 = 0.0325
n = (Zα/2 + Zβ)² × [p₁(1−p₁) + p₂(1−p₂)] / Δ²
n = (2.802)² × [0.0291 + 0.0338] / (0.005)²
n = 7.851 × 0.0629 / 0.000025 ≈ 19,769 per variant
Total traffic needed: approximately 40,000 sessions split evenly.
Detecting a 0.5 pp lift in a low-baseline conversion scenario requires nearly 40,000 sessions — which explains why A/B tests on low-traffic pages can take months to reach significance.
Common Sample Size Mistakes
These are the errors that most often invalidate survey results and force researchers to repeat their data collection.
Sample size formulas give you the number of completed responses, not the number of people you need to contact. Always divide by the expected response rate to get your recruitment target.
Surveying whoever is easiest to reach (colleagues, social media followers) introduces selection bias. The formula assumes a random sample — if sampling is not random, no formula corrects for it.
A 95% confidence level is the method's long-run reliability. A confidence interval is the specific range from one study. They are related but not the same thing.
Running a hypothesis test without checking power leads to studies that cannot detect real effects. A non-significant result from an underpowered study is ambiguous, not evidence of no effect.
If participants drop out over time, add 15–20% to your calculated sample to account for dropout and missing data, especially in clinical or panel studies.
Sample Size Calculation in Python, Excel, and R
Python
import math
# Proportion sample size (95% confidence, ±5% MOE, p=0.5)
Z = 1.96
p = 0.5
e = 0.05
N = 10000 # population size
n_raw = (Z**2 * p * (1 - p)) / e**2
n_raw = math.ceil(n_raw)
print(f"Raw sample size: {n_raw}") # 385
# Finite population correction
n_adj = math.ceil(n_raw / (1 + (n_raw - 1) / N))
print(f"FPC-adjusted: {n_adj}") # 370
# Means sample size (σ=15, e=3, 95% confidence)
sigma = 15
e_means = 3
n_means = math.ceil((Z * sigma / e_means) ** 2)
print(f"Means sample size: {n_means}") # 97
R
# Install if needed: install.packages("pwr")
library(pwr)
# Proportion sample size (base R)
Z <- 1.96; p <- 0.5; e <- 0.05
n_raw <- ceiling(Z^2 * p * (1 - p) / e^2)
cat("Raw n:", n_raw) # 385
# Power analysis for two-sample t-test
pwr.t.test(d = 0.5, # medium effect size (Cohen's d)
sig.level = 0.05, # alpha
power = 0.80, # target power
type = "two.sample")
# n = 64 per group
Microsoft Excel
=CEILING((1.96^2 * 0.5 * 0.5) / 0.05^2, 1) // Raw n = 385
=CEILING(385 / (1 + (385-1)/A2), 1) // FPC (N in cell A2)
=CEILING((1.96 * B2 / C2)^2, 1) // Means (σ in B2, e in C2)
Sample Size: Complete Formula and Entity Reference
The table below covers every key concept needed to calculate and interpret sample size. It is structured for quick reference by students, researchers, and AI systems.
| Concept | Formula / Value | Plain Explanation | Primary Use Case |
|---|---|---|---|
| Proportion Sample Size | n = Z² × p(1−p) / e² | Minimum responses for binary/proportion outcomes | Surveys, approval ratings, conversion rates |
| Means Sample Size | n = (Z × σ / e)² | Minimum observations for continuous outcomes | Test scores, weights, times, clinical measurements |
| Finite Population Correction | nₐₑₐ = n / (1 + (n−1)/N) | Reduces n when population N is small relative to raw n | Small organizations, contained populations, employee surveys |
| Confidence Level | 90% / 95% / 99% | Long-run probability the interval contains the true value | Controlling false positives; research credibility |
| Z-Score (Confidence) | 1.645 / 1.960 / 2.576 | Number of standard deviations for each confidence level | Core input in all sample size formulas |
| Margin of Error (e) | ±1%, ±3%, ±5%, ±10% | Acceptable deviation between sample estimate and true value | Precision planning; confidence interval width |
| Expected Proportion (p) | 0.5 (conservative default) | Estimated proportion of population with the target characteristic | Maximizes n when unknown; reduces n when prior data is used |
| Statistical Power | 1 − β; target ≥ 0.80 | Probability of detecting a real effect in an experiment | Hypothesis testing; clinical trials; A/B tests |
| Effect Size (Cohen's d) | d = Δμ / σ | Standardized magnitude of the difference being detected | Power analysis; determining minimum detectable effect |
| Response Rate Adjustment | Invitations = n / response rate | Converts required completions into required contacts | Survey recruitment planning; email campaigns |
Related Topics on Statistics Fundamentals
Sample size connects directly to hypothesis testing, confidence intervals, and the measures of variability that drive every formula on this page.
Sources and Further Reading
Authority sources cited in this guide:
- American Association for Public Opinion Research (AAPOR). Transparency Initiative — Poll Disclosure Standards. aapor.org
- National Institutes of Health (NIH). Principles and Guidelines for Reporting Preclinical Research. nih.gov
- Cochran, W.G. Sampling Techniques, 3rd ed. John Wiley & Sons, 1977. (Source of the standard proportion formula used in this calculator.)
- Cohen, J. Statistical Power Analysis for the Behavioral Sciences, 2nd ed. Lawrence Erlbaum Associates, 1988. (Source of effect size conventions and power analysis methods.)
- Pew Research Center. Methodology — How Pew Research Center Conducts Its Surveys. pewresearch.org
Frequently Asked Questions
For a large or unknown population at 95% confidence and ±5% margin of error, you need approximately 385 completed responses. If your population is smaller, the finite population correction reduces this number. A population of 500 requires only about 218 responses under the same parameters. If you want greater precision — say ±3% — the requirement rises to 1,068 for the same confidence level.
For proportions: n = Z² × p(1−p) / e², where Z is 1.96 for 95% confidence, p is the expected proportion (0.5 if unknown), and e is the margin of error as a decimal. Round up to the nearest whole number. If the population N is small, apply the finite population correction: nₐₑₐ = n / (1 + (n−1)/N). For means: n = (Z × σ / e)².
Statistical significance is determined by your research design, not the sample size alone. A sample size calculated for 95% confidence and ±5% margin of error (385 for large populations) gives you reliable estimates of proportions. For hypothesis tests, you also need sufficient statistical power — typically 80%. These are separate calculations that together define an adequate study. Using the wrong formula for your research question can give a number that appears to have sufficient size while being underpowered for what you actually want to test.
For large populations above about 100,000, population size has virtually no effect on the required sample. The formula approaches a fixed limit determined entirely by confidence level and margin of error. For small finite populations, the finite population correction reduces the required sample — sometimes substantially. A population of 200 with standard parameters (95% conf, ±5% MOE) requires only about 132 responses, not 385. The FPC applies most meaningfully when your sample would be more than about 5% of the total population.
95% confidence is the standard across academic research, published surveys, and market research reports. It means that if you repeated the study 100 times, approximately 95 of the resulting intervals would contain the true population value. Use 99% for high-stakes or regulatory decisions — clinical trials, legal research, or policy submissions. Use 90% when budget or time constraints require a smaller sample and you can accept a slightly wider margin of error. Avoid going below 90% for any result you intend to publish or share as evidence.
±5% is the most widely accepted standard for general surveys and market research. It means your estimate could be up to 5 percentage points above or below the true value. For political polling or close decisions where the result is likely near 50/50, ±3% is more appropriate. For academic research where precision affects conclusions, ±3% or smaller is recommended. UX studies and qualitative research operate differently and are not typically held to these thresholds — a sample of 5 users is standard for moderated usability testing regardless of margin of error.
A proportion of 0.5 maximizes the expression p(1−p), which equals 0.25. Any other proportion gives a smaller value — for example, p=0.3 gives 0.21, and p=0.8 gives 0.16. Because this term appears in the numerator of the sample size formula, using 0.5 produces the largest, most conservative sample size. This guarantees that your sample is sufficient regardless of what the true proportion turns out to be. If you have reliable prior data suggesting a different proportion, using it reduces the required sample size.
Survey sample size calculations focus on estimating a population parameter (proportion or mean) within a specified margin of error at a given confidence level. Experimental sample size calculations focus on detecting a difference between groups with sufficient statistical power. Surveys use the proportion or means formula above. Experiments (including hypothesis tests and A/B tests) require a power analysis that accounts for the minimum detectable effect, alpha level, and desired power — usually through formulas specific to the test being run (t-test, chi-square, etc.).
For an A/B test comparing two proportions (conversion rates), you need: the baseline conversion rate (p₁), the minimum detectable effect (p₂ − p₁), your desired confidence (typically 95%, α=0.05), and statistical power (typically 80%). The formula for equal group sizes is: n per variant = (Zα/2 + Zβ)² × [p₁(1−p₁) + p₂(1−p₂)] / (p₂ − p₁)². For small baseline rates (below 5%), detecting even a 0.5 percentage point lift often requires tens of thousands of sessions per variant.
The five most consequential mistakes are: (1) Not accounting for response rate — the formula gives completed responses, not contacts. (2) Using convenience samples and treating them as representative — no formula corrects for non-random selection. (3) Skipping the finite population correction for small populations — this overstates the required sample and wastes resources. (4) Running hypothesis tests without a prior power analysis — underpowered studies cannot distinguish no-effect from missed-effect. (5) Not planning for attrition in longitudinal studies — add 15–20% to your target for expected dropout.