Introduction
The Law of Large Numbers (LLN) is the anchor of statistics. It states a simple but powerful truth: as you collect more data, the sample average converges to the true expected value.
The Core Promise
Sample mean approaches population mean as sample size grows
Without LLN, machine learning would be impossible. We assume that training loss approximates true generalization error. LLN is the mathematical license for that assumption.
The Casino Intuition
Why Casinos Always Win
Bet on Red (Roulette)
Win: 18/38 = 47.4%
Lose: 20/38 = 52.6%
Expected Value
= -$0.052 per bet
The casino is not gambling. It is running a business based on LLN.
Mathematical Statement
Let be i.i.d. random variables with mean . The sample mean is:
Sample mean = sum of observations divided by count
Law of Large Numbers
As n approaches infinity, the sample mean converges to the true mean:
Interactive: Watch Convergence
Try different distributions and sample sizes. Notice how the running average stabilizes around the true mean as n grows. Small n = noisy. Large n = stable.
Notice how the "swings" (variance) are huge at the start (small N),
but the line inevitably tightens around the True Mean as N grows.
Why It Works: Variance Reduction
The intuition comes from looking at the variance of the sample mean.
Start with variance of sample mean:
For independent variables:
Result:
The Key Insight
As , variance . A random variable with zero variance is a constant. Therefore, the sample mean becomes the constant .
Weak vs Strong LLN
There are two versions with different mathematical guarantees.
Weak LLN
Convergence in Probability
For any margin , probability of being far from goes to zero.
Strong LLN
Almost Sure Convergence
The sample average converges with probability 1.
For most ML applications, the distinction does not matter. Both guarantee convergence.
LLN vs CLT
These are often confused. They describe different aspects of the same process.
| Theorem | What it says | Analogy |
|---|---|---|
| LLN | Sample mean converges to true mean | Where the arrow lands (the target) |
| CLT | Distribution of sample means is Normal | The shape of the arrow pattern |
See Central Limit Theorem for the distribution story.
The Gambler's Fallacy
The Mistake
"I got 10 heads in a row. LLN says it balances to 50%, so tails is 'due' next."
Why It's Wrong
The coin has no memory. LLN works by dilution, not compensation.
Example
The streak did not disappear. It just became statistically insignificant.
Interactive: Gambler's Fallacy Demo
Start with a streak of heads, then flip more. Watch how the ratio approaches 50% through dilution, not correction.
Starting with 10 Heads in a row (100%)
Total Stats
Observed: The percentage drops towards 50%, but the absolute number of Heads is still much higher than Tails (diff: 10).
The universe didn't generate extra tails to "fix" the streak. It just buried the streak under a mountain of new, normal data. That is Dilution.
ML Applications
Monte Carlo Methods
Replace intractable integrals with sample averages:
Used in MCMC, Reinforcement Learning (value estimation), Bayesian inference.
Empirical Risk Minimization
Training loss approximates true generalization loss:
The entire justification for training on finite datasets.
Stochastic Gradient Descent
Mini-batch gradient is an unbiased estimate of full gradient. Over many steps, the noise averages out. SGD converges because of LLN.
AlphaGo & MCTS
Cannot compute exact game tree values. Instead, play thousands of random games from a position. Average outcome converges to true value of the position.