What Is a Confidence Interval for a Proportion?
Most questions in survey research, public health, and quality control come down to "what fraction of the group has this trait?" Maybe it's the fraction of voters backing a candidate, the fraction of patients who improve on a drug, or the fraction of visitors who click a button. You can never survey everyone, so you take a sample and calculate p̂ — the sample proportion. On its own, p̂ is a single guess. The confidence interval is what tells you how far off that guess might be.
Point Estimate vs. Interval Estimate
A point estimate is a single number meant to stand in for the unknown population value — in this case, p̂ itself. It's the best single guess you have, but it carries no information about sampling variability: the fact that a different sample of the same size, drawn from the same population, would almost certainly produce a slightly different p̂.
An interval estimate wraps that point estimate in a range wide enough to account for that variability. Instead of saying "42.5% support the policy," an interval estimate says "we're 95% confident that between 39.7% and 45.3% support the policy." The second statement is more honest about what a sample can and can't tell you, and it's the version that survives statistical scrutiny.
The Population Parameter vs. the Sample Statistic
Two symbols matter here, and they're easy to mix up. The population proportion, written p, is the true, fixed value for the entire population — every voter, every patient, every visitor. It's almost always unknown, and that's precisely the value the confidence interval is trying to bracket.
The sample proportion, written p̂ (read "p-hat"), is the value you actually calculate from your data: the number of successes divided by the sample size. It's a known quantity, but it's also a random variable — if you repeated the sampling process, p̂ would land somewhere slightly different each time. The confidence interval uses what's known and variable (p̂) to make a statement about what's fixed and unknown (p). This estimation logic is the same one covered in the broader statistics and probability section, and it builds directly on the ideas in the sampling distributions guide on Statistics Fundamentals.
- p̂ (sample proportion): x ÷ n — the point estimate from your sample
- p (population proportion): The true, usually unknown value the interval estimates
- SE (standard error): √[p̂(1−p̂)/n] — measures variability of p̂ across samples
- z* (critical z-value): 1.645 (90%), 1.960 (95%), or 2.576 (99%)
- MOE (margin of error): z* × SE — the "give or take" added to p̂
- Success-failure condition: n·p̂ ≥ 10 and n·(1−p̂) ≥ 10 for the normal approximation to hold
The Confidence Interval for Proportion Formula
Every confidence interval for a proportion is built from the same equation. The sample proportion sits in the middle, and the margin of error is added and subtracted to create the lower and upper bounds.
p̂ = sample proportion (x/n)
z* = critical z-value
n = sample size
x = number of successes
Reading the formula from the inside out: p̂(1 − p̂) measures how "spread out" a binary outcome is — it peaks at 0.25 when p̂ = 0.5 and shrinks toward 0 as p̂ approaches 0 or 1. Dividing by n and taking the square root produces the standard error, which shrinks as the sample grows. Multiplying that by z* scales the standard error up to match your chosen confidence level, producing the margin of error.
Breaking Down the Components
The two pieces worth isolating are the standard error and the margin of error, since every other quantity in the formula is built from them.
The Standard Error (SE):
The Margin of Error (MOE):
Once SE and MOE are calculated, the interval is just p̂ − MOE for the lower bound and p̂ + MOE for the upper bound. The formula and methodology used here follow the conventions in NIST's Engineering Statistics Handbook and the Penn State STAT 415 course materials.
Critical Assumptions and Conditions
The formula above relies on the normal distribution to approximate what is, technically, a binomial distribution. That approximation only holds when three conditions are met. Skipping this check is the most common way this interval goes wrong in practice.
Independence, randomization, and the success-failure condition all need to hold. If the success-failure condition fails — usually when p̂ is close to 0 or 1, or n is small — the standard formula above can produce a misleading interval, including bounds outside [0, 1].
1. Independence Assumption
Each observation in the sample needs to be independent of the others — one person's response shouldn't influence or predict another's. This is usually satisfied by random sampling, but it can break down in clustered data (e.g., surveying multiple members of the same household) or in small populations where sampling without replacement removes a meaningful fraction of the group. A common rule of thumb is that the sample should be no more than 10% of the population when sampling without replacement.
2. Randomization Condition
The data should come from a random sample or a randomized experiment. Convenience samples — surveying whoever happens to be nearby, or using self-selected online responses — can introduce bias that no formula can correct. The confidence interval describes sampling variability, not the systematic error introduced by a non-random selection process.
3. Success-Failure (Normal Approximation) Condition
This is the condition most often checked explicitly, because it has a clean numeric test. The sample needs to contain enough of both outcomes — successes and failures — for the binomial distribution to be reasonably approximated by a normal curve.
For example, if p̂ = 0.04 (a 4% defect rate), the condition n × 0.04 ≥ 10 requires n ≥ 250 just to clear the success side. If your sample is smaller, or if p̂ is very close to 0 or 1, the standard interval can extend below 0% or above 100% — a clear sign the approximation has broken down. The Wald vs. Wilson comparison later on this page covers what to do in that situation.
How to Calculate a Confidence Interval for a Proportion, Step by Step
Step 1: Compute p̂ = x/n. Step 2: Choose a confidence level and find z*. Step 3: Calculate SE = √[p̂(1−p̂)/n]. Step 4: Find MOE = z* × SE. Step 5: Construct the interval as p̂ − MOE to p̂ + MOE.
Compute the Sample Proportion (p̂)
Divide the number of successes (x) by the total sample size (n): p̂ = x / n. This is the point estimate that the rest of the calculation builds around.
Choose Your Confidence Level and Find z*
The three most common confidence levels are 90%, 95%, and 99%, corresponding to z* values of 1.645, 1.960, and 2.576. A higher confidence level uses a larger z* and produces a wider interval — see the critical value table below.
Calculate the Standard Error (SE)
Plug p̂ and n into SE = √[p̂(1−p̂)/n]. This single number captures how much p̂ would be expected to bounce around across repeated samples of the same size.
Determine the Margin of Error (MOE)
Multiply the critical z-value by the standard error: MOE = z* × SE. This is the "plus or minus" figure that gets added to and subtracted from p̂.
Construct and Interpret the Bounds
Subtract MOE from p̂ for the lower bound, and add MOE to p̂ for the upper bound: [p̂ − MOE, p̂ + MOE]. State the result as "we are X% confident the true proportion lies in this range."
Real-World Worked Examples
The three examples below apply the 5-step process to three different fields and three different confidence levels — 95%, 99%, and 90% — so you can see how the same formula adapts to different inputs. Each example shows the full arithmetic and a plain-language conclusion.
Scenario A — Political Polling (95% Confidence)
Problem: A poll surveys 1,200 registered voters and finds that 540 support Candidate A. Construct a 95% confidence interval for the true proportion of voters who support Candidate A.
Sample proportion: p̂ = x/n = 540/1200 = 0.45
Confidence level: 95% → z* = 1.960
Standard error:
SE = √[0.45 × (1 − 0.45) / 1200] = √[0.45 × 0.55 / 1200] = √[0.2475/1200] = √0.00020625 ≈ 0.01436
Margin of error:
MOE = 1.960 × 0.01436 ≈ 0.0281 (about 2.8 percentage points)
Interval:
0.45 − 0.0281 = 0.4219 to 0.45 + 0.0281 = 0.4781
95% CI ≈ [0.422, 0.478], or 42.2% to 47.8%
✅ Conclusion: We are 95% confident that the true proportion of voters supporting Candidate A is between 42.2% and 47.8%. Because this entire interval is below 50%, the poll does not provide evidence that Candidate A holds majority support — though the race appears close. Check the condition: n·p̂ = 540 ≥ 10 and n·(1−p̂) = 660 ≥ 10, so the normal approximation is valid.
Scenario B — Clinical Trial Success Rate (99% Confidence)
Problem: A medical trial tests a new drug on 450 patients, and 390 show a positive response. Construct a 99% confidence interval for the true proportion of patients who respond positively to the drug.
Sample proportion: p̂ = x/n = 390/450 ≈ 0.8667
Confidence level: 99% → z* = 2.576
Standard error:
SE = √[0.8667 × (1 − 0.8667) / 450] = √[0.8667 × 0.1333 / 450] = √[0.11553/450] = √0.0002567 ≈ 0.01602
Margin of error:
MOE = 2.576 × 0.01602 ≈ 0.0413 (about 4.1 percentage points)
Interval:
0.8667 − 0.0413 = 0.8254 to 0.8667 + 0.0413 = 0.9080
99% CI ≈ [0.825, 0.908], or 82.5% to 90.8%
✅ Conclusion: We are 99% confident that the true response rate to the drug lies between 82.5% and 90.8%. Note the wider interval compared to Scenario A — the higher confidence level (99% vs. 95%) and a p̂ closer to the extremes both push z* and the resulting interval shape. Check the condition: n·p̂ = 390 ≥ 10 and n·(1−p̂) = 60 ≥ 10, so the approximation still holds, though it's worth comparing against a Wilson interval given how close p̂ is to 1 — see the comparison section.
Scenario C — Website Conversion Rate (90% Confidence)
Problem: An A/B test on a checkout page records 2,000 unique visitors, of whom 80 complete a purchase. Construct a 90% confidence interval for the true conversion rate of this page.
Sample proportion: p̂ = x/n = 80/2000 = 0.04
Confidence level: 90% → z* = 1.645
Standard error:
SE = √[0.04 × (1 − 0.04) / 2000] = √[0.04 × 0.96 / 2000] = √[0.0384/2000] = √0.0000192 ≈ 0.00438
Margin of error:
MOE = 1.645 × 0.00438 ≈ 0.0072 (about 0.72 percentage points)
Interval:
0.04 − 0.0072 = 0.0328 to 0.04 + 0.0072 = 0.0472
90% CI ≈ [0.0328, 0.0472], or 3.28% to 4.72%
✅ Conclusion: We are 90% confident that the true conversion rate for this checkout page is between 3.28% and 4.72%. Check the condition: n·p̂ = 80 ≥ 10 and n·(1−p̂) = 1920 ≥ 10, so the normal approximation applies cleanly here despite p̂ being small, because n is large.
Interactive Proportion Confidence Interval Calculator
Enter the number of successes (x), the total sample size (n), and a confidence level to see p̂, the standard error, the margin of error, and the resulting interval. The calculator also checks the success-failure condition and flags results where the normal approximation may not be reliable.
🔬 Confidence Interval for Proportion Calculator
Statistical Comparisons: Key Differences
Confidence Interval vs. Hypothesis Test
A confidence interval and a hypothesis test are two views of the same underlying calculation, but they answer different questions. A confidence interval asks "what range of values is plausible for p?" A hypothesis test asks "is a specific claimed value of p consistent with this data?" The full mechanics of the second question are covered in the proportion hypothesis testing guide.
| Factor | Confidence Interval | Hypothesis Test |
|---|---|---|
| Question answered | What range likely contains p? | Is a specific p₀ consistent with the data? |
| Output | A range [lower, upper] | A test statistic and p-value |
| Decision required? | No — purely descriptive | Yes — reject or fail to reject H₀ |
| Relationship | If p₀ falls outside the CI... | ...a two-tailed test at matching α would reject H₀: p = p₀ |
| Typical use | Reporting an estimate with precision | Testing a specific claim or benchmark |
Proportion CI vs. Mean CI
The proportion interval is a close cousin of the confidence interval for a mean, but the underlying data is different in an important way: a proportion comes from binary data (success/failure, yes/no, click/no-click), while a mean comes from continuous data (heights, times, scores). That difference changes how the standard error is calculated. The mean-based version is covered separately in the confidence interval for the mean guide.
| Factor | Proportion CI | Mean CI |
|---|---|---|
| Data type | Binary (success/failure) | Continuous (numeric) |
| Point estimate | p̂ = x/n | x̄ (sample mean) |
| Standard error formula | √[p̂(1−p̂)/n] | s/√n |
| Distribution used | Normal approximation (z*) | Normal (z*) or t-distribution (t*) |
| Key condition | Success-failure: n·p̂ ≥ 10, n·(1−p̂) ≥ 10 | Normality of data or n large enough for CLT |
| Tracks | Survey/poll percentages, defect rates, click rates | Averages — heights, times, scores, costs |
Wald Interval vs. Wilson Score Interval
The formula on this page — p̂ ± z*√[p̂(1−p̂)/n] — is known as the Wald interval, and it's the version taught in most introductory courses because it's simple and matches the same logic used for means. It works well when the success-failure condition is comfortably met and p̂ is not too close to 0 or 1.
The trouble starts near the edges. When p̂ is close to 0 or 1, or n is small, the Wald interval can produce bounds below 0% or above 100% — clearly impossible for a proportion — and its actual coverage (the percentage of intervals that truly contain p) can fall noticeably short of the stated confidence level. The Wilson score interval fixes this by inverting the hypothesis-test formula rather than using a direct normal approximation, which keeps the bounds within [0, 1] and gives more accurate coverage in exactly the cases where Wald struggles.
If p̂ is below roughly 0.1 or above 0.9, or if n·p̂ or n·(1−p̂) is close to the threshold of 10, consider a Wilson score interval instead of the Wald formula on this page. Many statistical software packages (R's prop.test, for instance) default to Wilson or a related continuity-corrected method for this reason.
Quick-Reference Tables
Metric & Formula Quick-View
| Metric / Component | Formula | Practical Interpretation |
|---|---|---|
| Sample Proportion (p̂) | p̂ = x/n | The baseline point estimate from your sample data |
| Standard Error (SE) | SE = √[p̂(1−p̂)/n] | Average variability of p̂ across repeated samples |
| Margin of Error (MOE) | MOE = z* × SE | Maximum expected distance between p̂ and p |
| Confidence Interval Bounds | p̂ ± MOE | The full range of plausible values for p |
Common Z-Critical Values
| Confidence Level | Alpha (α) | Tail Area (each side) | Critical z-value (z*) |
|---|---|---|---|
| 90% | 0.10 | 0.050 | 1.645 |
| 95% | 0.05 | 0.025 | 1.960 |
| 99% | 0.01 | 0.005 | 2.576 |
Confidence Interval Behavior Controls
| Factor | Change | Effect on Width | Effect on Precision |
|---|---|---|---|
| Sample size (n) | Increase | Narrower | Higher precision |
| Sample size (n) | Decrease | Wider | Lower precision |
| Confidence level | Increase | Wider | Lower precision |
| Confidence level | Decrease | Narrower | Higher precision |
| p̂ (closer to 0.5) | Approaches 0.5 | Maximum width | Lowest precision (for fixed n) |
Symbols Glossary
| Symbol | Name | Meaning |
|---|---|---|
| p̂ | Sample proportion | x/n — point estimate from the sample |
| p | Population proportion | True, unknown value for the whole population |
| x | Number of successes | Count of the outcome of interest in the sample |
| n | Sample size | Total number of observations |
| SE | Standard error | Estimated standard deviation of the sampling distribution of p̂ |
| z* | Critical z-value | Multiplier from the standard normal distribution for a given confidence level |
| MOE | Margin of error | z* × SE — the "plus or minus" added to p̂ |
| 1−α | Confidence level | Long-run proportion of intervals expected to contain p |
| CI | Confidence interval | [p̂ − MOE, p̂ + MOE] |
Common Misconceptions About Confidence Intervals
Confidence intervals are widely reported but often misread, even in professional contexts. The table below pairs the most common misreadings with the correct interpretation.
| What People Say | Why It's Wrong | What's Correct |
|---|---|---|
| "There's a 95% probability that p is in this interval" | p is fixed, not random — it either is or isn't in the interval | 95% of intervals built this way, across repeated sampling, would contain p |
| "A wider interval means a less accurate sample" | Width depends on confidence level too, not just sample quality | A wider interval may simply reflect a higher confidence level or smaller n |
| "If two intervals overlap, the two proportions aren't different" | Overlapping CIs don't directly translate to a hypothesis test result | Use a dedicated two-proportion test to compare two groups directly |
| "The interval contains p̂ at the very center, always" | True only for the symmetric Wald interval | Methods like Wilson produce asymmetric intervals where p̂ isn't centered |
| "99% confidence is always better than 95%" | Higher confidence trades away precision | Choose the confidence level based on how costly an error would be, not by default |
Real-World Applications
The confidence interval for a proportion shows up anywhere an outcome is binary and a sample stands in for a larger group.
Polling Estimates
Every published poll's "margin of error" is the MOE from this formula. A poll reporting "45% ± 3%" is reporting a 95% confidence interval of roughly [42%, 48%].
Public Health Tracking
Disease prevalence, vaccination rates, and treatment response rates are all proportions estimated from samples, with confidence intervals reported alongside point estimates in public health bulletins.
Quality Control
Manufacturers estimate defect rates from sampled batches. A confidence interval shows whether the true defect rate plausibly exceeds an acceptable threshold.
A/B Testing & CRO
Conversion rates, click-through rates, and bounce rates are proportions. Confidence intervals show the plausible range for a metric before a test reaches full significance.
Market Research
Brand awareness, purchase intent, and customer satisfaction percentages reported by market research firms are sample proportions with an implied or stated confidence interval.
Education Research
Pass rates, graduation rates, and survey-based attitude measures (agree/disagree) are analyzed the same way, often compared across cohorts using the methods in the hypothesis testing examples guide.
Downloadable Statistics Resources
These resources are useful for working through additional problems or checking values without re-deriving the formula each time.
- Confidence Interval Cheat Sheet (PDF) — Download formula reference
- Z-Table (Full) — View the standard normal table
- All Statistics Calculators — Browse the full calculator library on Statistics Fundamentals
Frequently Asked Questions
You interpret it by stating: "We are X% confident that the true population proportion lies between the lower and upper bounds." This means that if you repeated the sampling process many times and built an interval each time, X% of those intervals would contain the true population parameter. It does not mean there is an X% probability that p falls within this one specific interval once the data has already been collected — p is fixed, the interval is what varies from sample to sample.
Sample size has an inverse relationship with the width of a confidence interval. As n increases, the standard error SE = √[p̂(1−p̂)/n] decreases, which produces a smaller margin of error and a narrower, more precise confidence interval. Because n appears under a square root, you need to quadruple the sample size to roughly cut the margin of error in half.
A sample proportion (p̂) is an empirical value calculated from a subset of data — it is known, but it varies from sample to sample. A population proportion (p) is a fixed, typically unknown theoretical parameter representing the entire group. The confidence interval uses the known, variable p̂ to estimate a range for the fixed, unknown p.
You can use the normal approximation (the Wald interval covered on this page) when the success-failure condition is satisfied: both n·p̂ ≥ 10 and n·(1−p̂) ≥ 10. This ensures the underlying binomial distribution is close enough to a normal distribution for the formula to behave well. When either count falls below 10 — common when p̂ is near 0 or near 1 — a Wilson score interval is generally more reliable.
A 95% confidence interval is wider because it requires a larger critical z-value (1.960 vs. 1.645) to provide a stronger guarantee. To be more confident that your interval contains the true population proportion, the interval has to cover more ground — there is a direct trade-off between confidence level and precision.
If n·p̂ < 10 or n·(1−p̂) < 10, the normal approximation underlying the Wald interval becomes unreliable — the resulting interval can have poor coverage or even extend below 0% or above 100%. In that case, an alternative such as the Wilson score interval or an exact binomial (Clopper-Pearson) interval is preferred, as discussed in the Wald vs. Wilson comparison above.
The "margin of error" reported alongside a poll result is exactly the MOE from this page's formula: MOE = z* × SE. A poll reporting a result of "45%, with a margin of error of ±3 percentage points" at 95% confidence is reporting a confidence interval of approximately [42%, 48%]. The MOE alone doesn't communicate the confidence level, so reputable polls typically state both.
Sources and References
This guide draws on the following primary and secondary sources for the formulas, critical values, and conventions used above.
- NIST Engineering Statistics Handbook — Confidence Limits for a Proportion. National Institute of Standards and Technology. itl.nist.gov
- Penn State STAT 415 — Introduction to Mathematical Statistics, Confidence Intervals for Proportions. Penn State Eberly College of Science. online.stat.psu.edu
- OpenStax Introductory Statistics — Ch. 8: Confidence Intervals. Rice University. openstax.org
- Pew Research Center — U.S. Survey Methodology. pewresearch.org
- Wilson, E.B. (1927) — "Probable Inference, the Law of Succession, and Statistical Inference." Journal of the American Statistical Association, 22(158), 209–212.
- Brown, L.D., Cai, T.T. & DasGupta, A. (2001) — "Interval Estimation for a Binomial Proportion." Statistical Science, 16(2), 101–133.