Introduction
Many problems are theoretically solvable but computationally impossible. Calculating portfolio risk requires integrating over thousands of correlated assets. Finding the optimal Go move requires searching a tree with more states than atoms in the universe.
The Monte Carlo Philosophy
If you cannot calculate the answer, simulate it millions of times and average.
"If you cannot calculate the area of a lake, throw a million rocks at it. Count how many splash vs. hit ground. That ratio gives you the area."
History: The Manhattan Project
The Problem
1940s Los Alamos. Stanislaw Ulam needed to calculate the probability of neutron chain reactions. The differential equations were too complex.
The Solution
Instead of solving equations, simulate individual neutrons moving randomly and observe outcomes. Named after the Monaco casino because it relies on chance (LLN).
The Dartboard Method (Estimating Pi)
How do you calculate pi without geometry? Throw darts randomly.
Setup
- Square with side . Area =
- Circle inscribed with radius . Area =
- Ratio =
Simulation
- Throw darts randomly at square
- Count how many land inside circle
- Ratio =
More darts = better estimate. This is the Law of Large Numbers in action.
Interactive: Estimate Pi
Throw virtual darts at a square. Watch the estimate converge to 3.14159... as you add more darts.
Paused
Convergence
Monte Carlo Integration
For complex functions in high dimensions, standard numerical integration fails. Monte Carlo converts integrals into expectations.
Replace integral with sample average. = volume of domain, = random samples.
Convergence Rate
To halve error: need samples. Slow convergence, but independent of dimension!
Beating the Curse of Dimensionality
Grid-based methods suffer exponentially as dimensions increase. Monte Carlo does not.
| Dimension | Grid Points (10/axis) | Monte Carlo |
|---|---|---|
| 1D (Line) | 10 | ~100 |
| 3D (Cube) | 1,000 | ~100 |
| 10D | 10 billion | ~1,000 |
| 100D | 10^100 (impossible) | ~10,000 |
This is why Monte Carlo is the ONLY way to solve high-dimensional physics and Bayesian inference problems.
Markov Chain Monte Carlo (MCMC)
What if we cannot sample uniformly? MCMC generates samples from complex distributions using a Markov Chain that explores high-probability regions.
Metropolis-Hastings Algorithm
- Start at random point
- Propose new point nearby ()
- Calculate acceptance ratio:
- If : always move to
- If : move with probability , else stay
- Repeat thousands of times
Why it works
The chain spends more time in high-probability regions. After "burn-in," samples are from the target distribution.
Key parameter
Step size matters! Too small: slow exploration. Too large: many rejections. Target 20-50% acceptance.
Interactive: MCMC Visualization
Watch Metropolis-Hastings explore a bimodal distribution (two peaks). Adjust step size to see how it affects exploration.
Chain Statistics
Goldilocks Principle:
Too high (>80%): Steps are too small, exploring slowly.
Too low (<20%): Steps are too big, constantly rejected.
Target: ~20-50% for optimal mixing.
Why Bimodal?
Notice how the chain gets "stuck" in one peak for a while, then eventually makes a lucky jump across the valley to the other peak. This 'mode hopping' is the hardest challenge in MCMC.
ML Applications
Reinforcement Learning (MC Prediction)
Estimate state value by playing many episodes and averaging returns:
Used in AlphaGo (MCTS), policy gradient methods.
Bayesian Inference
Sample from posterior when the integral is intractable:
MCMC, Hamiltonian MC, Variational Inference.
Dropout as Monte Carlo
Running inference with dropout multiple times and averaging is equivalent to approximate Bayesian inference. Each forward pass samples a different sub-network.
Finance: Value at Risk (VaR)
Simulate thousands of market scenarios to estimate worst-case losses at a given confidence level.