What Is the Law of Large Numbers?
In concrete terms: run any random experiment enough times — rolling a die, sampling household incomes, measuring machine output — and the average of your results will get arbitrarily close to the theoretical average that probability assigns to that process. The longer you run the experiment, the closer the match.
This idea was first proved rigorously by Jacob Bernoulli in 1713 for binary outcomes in his posthumous work Ars Conjectandi. Siméon Denis Poisson later generalized it to non-binary random variables and coined the name Loi des grands nombres in 1837. Pafnuty Chebyshev provided a clean algebraic proof via his inequality in 1867, and Andrey Kolmogorov established the modern measure-theoretic foundation in 1933.
The theorem is the mathematical justification for one of the most intuitive beliefs in science: that collecting more data gives you a more accurate picture of reality. It also sits at the foundation of hypothesis testing, confidence intervals, and virtually every form of statistical inference. Return to Statistics Fundamentals to explore the full range of related topics.
- Subject: Probability theory and statistical convergence
- Core claim: Sample mean X̄ₙ converges to population mean μ as n → ∞
- Two forms: Weak law (convergence in probability) and strong law (almost sure convergence)
- Requires: Independent, identically distributed random variables with finite expected value
- Variance formula: Var(X̄ₙ) = σ²/n — variance of the sample mean shrinks as n grows
- Does not mean: Individual outcomes are predicted or corrected by the law
The Law of Large Numbers Formula
Let X₁, X₂, …, Xₙ be a sequence of independent and identically distributed random variables, each with a finite expected value E[Xᵢ] = μ. The sample mean is defined as:
X̄ₙ = sample mean of n observations
n = number of trials
Xᵢ = i-th observation
μ = true population mean E[X]
The LLN asserts that as n → ∞, this quantity X̄ₙ approaches μ. The exact sense in which it "approaches" μ differs between the weak and strong forms, covered in detail in the theorems section below.
The Mathematics of Variance Reduction
The structural reason the law works is visible in how variance behaves. If the population distribution has variance σ², then the variance of the sample mean is:
σ² = population variance
n = sample size
Because n sits in the denominator, the variance of the sample mean decreases toward zero as sample size grows. At the limit:
limn→∞ Var(X̄ₙ) = limn→∞ σ²/n = 0 | The probability distribution of the sample mean compresses into a sharp spike centered exactly at μ. This is why large samples are more reliable than small ones — not because of luck, but because of mathematics.
This relationship connects directly to the sampling distribution of the sample mean and explains why the Central Limit Theorem uses σ/√n as the standard error.
Weak Law vs. Strong Law of Large Numbers
The Law of Large Numbers comes in two mathematically distinct forms. Both reach the same conclusion — that X̄ₙ approaches μ — but they differ in what "approaches" means precisely. Understanding the distinction is important for probability courses and statistical theory.
The Weak Law of Large Numbers (WLLN)
For any arbitrarily small ε > 0:
Interpretation: As n grows, the probability that the sample mean deviates from the true mean by more than any fixed amount ε goes to zero. It guarantees convergence in probability.
The weak law says that for any specific value of n, it is overwhelmingly likely (but not guaranteed across all realizations) that the sample mean stays close to μ. Chebyshev's Inequality provides a direct algebraic proof: P(|X̄ₙ − μ| ≥ ε) ≤ σ²/(nε²), and as n → ∞ the right side goes to zero.
The "weak" label does not imply the result is unimportant — it refers to the weaker form of convergence used. The weak law allows for the theoretical possibility of isolated extreme deviations even as n grows, though those deviations become increasingly improbable.
The Strong Law of Large Numbers (SLLN)
Interpretation: The sample mean converges to the true mean almost surely — with probability 1 — for every possible infinite sequence of outcomes, not just with high probability for a given n. It guarantees almost sure convergence.
Kolmogorov proved this stronger result in 1933 using measure theory. The strong law makes the WLLN's guarantee permanent: once n is large enough, the path of the running average locks onto μ and does not wander away again. This is the form most used in mathematical statistics and econometrics.
| Feature | Weak Law (WLLN) | Strong Law (SLLN) |
|---|---|---|
| Type of convergence | Convergence in probability | Almost sure convergence |
| Probability statement | P(|X̄ₙ − μ| ≥ ε) → 0 | P(lim X̄ₙ = μ) = 1 |
| Isolated extreme deviations | Allowed (probability → 0) | Ruled out (probability exactly 0) |
| Who proved it | Chebyshev / Khinchin (1867/1929) | Kolmogorov (1933) |
| Tools required | Chebyshev's Inequality | Measure theory / Borel-Cantelli |
| Practical difference | Almost none in applications | Stronger theoretical guarantee |
Law of Large Numbers vs. Central Limit Theorem
The Law of Large Numbers and the Central Limit Theorem are frequently confused because both describe what happens to sample means as n grows. They answer different questions, though, and understanding each one properly requires keeping them separate.
The LLN says the sample mean will equal μ in the limit. The CLT says the distribution of the sample mean approaches a normal curve. These are complementary facts about different aspects of sampling behavior. See the Central Limit Theorem guide for a full treatment.
| Comparison Point | Law of Large Numbers | Central Limit Theorem |
|---|---|---|
| Primary question | What value does X̄ₙ approach? | What shape is the distribution of X̄ₙ? |
| End result | A single point: X̄ₙ → μ | A distribution: X̄ₙ ≈ N(μ, σ²/n) |
| What it tracks | Convergence of the mean value | Convergence of the error distribution |
| Variance in limit | Var(X̄ₙ) → 0 | SE = σ/√n (normalizes the spread) |
| Requires normality? | No | No (but result is normal shape) |
| Practical use | Justifies using X̄ as estimate of μ | Builds confidence intervals and z-tests |
The two theorems work together in practice. The LLN guarantees your sample mean will land near μ. The CLT tells you how to calculate the probability that it lands within any specific range of μ — which is exactly what confidence intervals for the mean and hypothesis tests compute.
How the Law of Large Numbers Works: Three Phases
Watching convergence happen across increasing sample sizes is the most direct way to develop intuition for the law. The progression follows three recognizable phases regardless of the underlying distribution.
Volatile Micro-Sample (n ≤ 10)
Individual outcomes dominate the average. A fair coin flipped 4 times landing on heads 3 times (75%) is common. The sample mean is highly sensitive to the randomness of each trial. Do not draw conclusions from small samples.
Stabilizing Meso-Sample (10 < n < 1,000)
Random runs lose their mathematical leverage over the cumulative calculation. With 100 trials, consecutive streaks stop dominating the running average and the trajectory visibly moves toward the expected value line. Variance of the sample mean has dropped by a factor of 100.
Converged Macro-Sample (n ≥ 1,000)
The sample mean locks tightly onto μ. Deviations shrink to fractions of a percent. At n = 10,000 coin flips, the deviation from exactly 50% heads is typically under 1%. This is the regime in which insurance pricing, casino mathematics, and large-scale polling operate with confidence.
Worked Examples — Law of Large Numbers in Action
Example 1: The Coin Toss (Binary Uniform Distribution)
A fair coin assigns X = 1 for heads and X = 0 for tails. The theoretical expected value is E[X] = 0.5. What happens to the sample mean across growing trial counts?
Setup: E[X] = (1 × 0.5) + (0 × 0.5) = 0.5. The variance is σ² = 0.25. The variance of the sample mean is Var(X̄ₙ) = 0.25/n.
Small sample (n = 10): Results — 7 heads, 3 tails. X̄₁₀ = 0.70. Deviation from μ: +0.20. Var(X̄₁₀) = 0.25/10 = 0.025.
Medium sample (n = 100): Results — 53 heads, 47 tails. X̄₁₀₀ = 0.53. Deviation from μ: +0.03. Var(X̄₁₀₀) = 0.25/100 = 0.0025.
Large sample (n = 10,000): Results — 5,004 heads, 4,996 tails. X̄₁₀,₀₀₀ = 0.5004. Deviation from μ: +0.0004. Var(X̄₁₀,₀₀₀) = 0.25/10,000 = 0.000025.
| Trials (n) | Heads | Sample Mean X̄ₙ | Deviation from 0.5 | Var(X̄ₙ) |
|---|---|---|---|---|
| 10 | 7 | 0.7000 | +0.2000 | 0.02500 |
| 100 | 53 | 0.5300 | +0.0300 | 0.00250 |
| 1,000 | 507 | 0.5070 | +0.0070 | 0.00025 |
| 10,000 | 5,004 | 0.5004 | +0.0004 | 0.000025 |
✅ Each tenfold increase in sample size reduces the deviation from μ by roughly a factor of three (1/√10 ≈ 0.316). The sample mean converges to 0.5 as the Law of Large Numbers predicts.
Example 2: The Six-Sided Die (Discrete Uniform Distribution)
Rolling a fair six-sided die yields outcomes X ∈ {1, 2, 3, 4, 5, 6}, each with probability 1/6. What is the expected value and how does the sample mean converge?
Expected value: μ = (1+2+3+4+5+6)/6 = 21/6 = 3.5. Variance: σ² = E[X²] − μ² = 91/6 − 12.25 = 2.917.
Small sample (n = 6): Rolls: {1, 1, 4, 6, 2, 5}. Sum = 19. X̄₆ = 3.167. Deviation: −0.333.
Medium sample (n = 60): Random fluctuations distribute more evenly. Cumulative total ≈ 219. X̄₆₀ = 3.650. Deviation: +0.150.
Large sample (n = 60,000): Frequencies approach 1/6 per face. Total ≈ 210,042. X̄₆₀,₀₀₀ = 3.5007. Deviation: +0.0007.
✅ The sample mean converges to 3.5, the true expected value, as n grows. The key mechanism: Var(X̄ₙ) = 2.917/n → 0 as n → ∞.
Example 3: Insurance Risk Pools
An insurer covers policyholders where each has a 1% annual probability of a $100,000 claim. Expected payout per policyholder: E[X] = $1,000. How does pool size affect pricing stability?
Small pool (n = 100): Expected claims: 1 at $100,000. But if 3 claims occur by chance, actual cost = $300,000 — a 200% overrun. The per-policyholder cost spikes to $3,000 against a $1,000 premium. The variance of total payout is enormous relative to the pool.
Large pool (n = 1,000,000): Expected claims: 10,000 at $100,000 each = $1,000,000,000 total. Actual claims will fall between roughly 9,900 and 10,100 with very high probability. The per-policyholder cost stabilizes to within a few dollars of $1,000. Premiums can be set precisely.
Why it works — variance shrinks: σ² per policyholder = $100,000² × 0.01 × 0.99 ≈ $990,000,000. Var(X̄ₙ) = σ²/n. At n = 1,000,000: Var(X̄) ≈ $990. Standard deviation of average payout ≈ $31.46 — a rounding error against a $1,000 premium.
✅ The Law of Large Numbers enables insurers to set stable premiums. With millions of policyholders, the actual per-person payout converges so closely to the expected value that the company can price risk accurately and remain solvent.
LLN vs. the Gambler's Fallacy
The Gambler's Fallacy is the most widespread misreading of the Law of Large Numbers. It holds that if random outcomes have deviated from their expected pattern in the short term, they must "correct" themselves in the near future.
"A coin has come up heads 8 times in a row. Tails is now much more likely on the next flip." — This reasoning is false. The coin has no memory. The probability of heads on flip 9 remains exactly 0.5, regardless of past outcomes.
The LLN does not work by forcing future outcomes to compensate for past deviations. It works by diluting past anomalies with a large volume of new, ordinary results.
| Scenario | Gambler's Fallacy (Wrong) | Law of Large Numbers (Correct) |
|---|---|---|
| 8 heads in a row | Tails is "due." Next flip is biased toward tails. | Next flip is still 50/50. Independence is absolute. |
| Mechanism | Outcomes must self-correct to restore the average. | Anomaly is diluted by thousands of subsequent ordinary flips. |
| 10,008 total flips (8 + 10,000 new) | Those 10,000 new flips should favor tails to compensate. | 10,000 new flips land ~5,000 each. Heads ratio: 5,008/10,008 = 50.04%. Converged. |
| Roulette: red 10 times running | Black is overdue and a better bet now. | Each spin is independent. The house edge applies equally to every spin. |
The "Law of Averages" you hear about in casual conversation is usually the Gambler's Fallacy in disguise. The actual law operates over long runs and says nothing about the short-term sequence of outcomes. See the basic probability page for more on independence and the probability rules that govern individual events.
Real-World Applications of the Law of Large Numbers
The LLN is not an abstract mathematical curiosity. It is the operational backbone of several industries and fields of research.
Insurance & Actuarial Science
Actuaries pool millions of policyholders so that actual claims closely match statistical projections. The LLN makes premium pricing feasible and financially stable. Read more in the expected value guide.
Casino Operations
A casino need not win every hand. An American roulette wheel has a 5.26% house edge. Across millions of spins, the actual revenue converges precisely to that margin. Short-term player wins do not threaten profitability in the long run.
Polling & Survey Research
Public opinion polls rely on the LLN to claim that sample percentages reflect population percentages within a stated margin. Larger samples produce smaller margins — a direct consequence of Var(X̄ₙ) = σ²/n. Related: confidence intervals.
Quantitative Finance
Portfolio diversification reduces variance by combining uncorrelated assets. Monte Carlo pricing models run millions of simulated asset paths; the LLN ensures the average converges to the true theoretical option value.
Machine Learning
Empirical Risk Minimization trains models to minimize error on a finite dataset. The LLN guarantees that training error on a large dataset approximates the true generalization error. This is why deep learning requires massive datasets.
Clinical Trials
Drug trials use large sample sizes so that the observed treatment effect converges to the true population effect. Power analysis — deciding how large a trial needs to be — is a direct application of the LLN and the CLT together. Related: power of a test.
Interactive LLN Simulator
Use the simulator below to watch the sample mean converge toward the true expected value in real time. Select an experiment and trial count, then run the simulation to see convergence happen.
Law of Large Numbers — Convergence Simulator
Why the Law Works: Proof via Chebyshev's Inequality
A full measure-theoretic proof of the strong law is beyond a first course. But the weak law has a clean, accessible proof using Chebyshev's Inequality that requires only knowledge of expected value and variance.
Chebyshev's Inequality states that for any random variable Y with mean μ_Y and variance σ²_Y, and for any ε > 0:
Apply to the Sample Mean
Set Y = X̄ₙ. We know E[X̄ₙ] = μ and Var(X̄ₙ) = σ²/n. Substitute into Chebyshev's Inequality.
Obtain the Probability Bound
P( |X̄ₙ − μ| ≥ ε ) ≤ σ²/n / ε² = σ² / (nε²). For any fixed ε > 0 and σ², this bound depends on n in the denominator.
Take the Limit
As n → ∞: σ² / (nε²) → 0. Therefore P( |X̄ₙ − μ| ≥ ε ) → 0 for every ε > 0. This is exactly the statement of the Weak Law of Large Numbers.
Three steps, one inequality. The weak law follows directly from the fact that variance of the sample mean is σ²/n, which vanishes as n grows. The mathematical machinery is minimal; the conclusion is powerful. See also: variance and standard deviation.
Related Concepts and Internal Resources
The Law of Large Numbers is one node in a larger web of probability and statistics concepts. The following resources on Statistics Fundamentals extend the ideas covered here.
| Concept | Connection to LLN | Resource |
|---|---|---|
| Expected Value | The quantity X̄ₙ converges toward — the target of the LLN | Expected Value Guide |
| Central Limit Theorem | Describes the shape of the distribution of X̄ₙ as n grows | CLT Guide |
| Sampling Distributions | The sampling distribution of X̄ compresses toward μ as per LLN | Sample Mean Distribution |
| Confidence Intervals | Built on the CLT and LLN; require X̄ₙ to estimate μ reliably | CI for Mean |
| Hypothesis Testing | Uses X̄ₙ as an estimator — valid because of the LLN | Hypothesis Testing |
| Basic Probability | Independence of trials is the core requirement of the LLN | Basic Probability |
| Normal Distribution | Shape the distribution of X̄ₙ converges to (via CLT) | Normal Distribution |
| Sample Size Calculator | Determines n needed for desired precision — driven by LLN logic | Sample Size Calculator |