Memoryless Systems
Imagine a drunk person walking on a number line. Their next step (left or right) depends only on where they are right now, not on how they got there. They do not remember their path. This is a Random Walk, and it is a special case of a Markov Chain.
Markov Chains model systems that transition from one state to another probabilistically, where the future depends only on the present, not the past. This "memoryless" property makes them mathematically tractable and surprisingly powerful.
The Markov Property
A Markov Chain is fully defined by:
- State Space: The set of all possible states (e.g., {Sunny, Rainy} or {0, 1, 2, ..., N}).
- Transition Probabilities: The probability of moving from state to state , denoted .
- Initial Distribution: The probability of starting in each state.
Interactive Simulator
A simple weather model. If it's Sunny today, there's an 80% chance it's Sunny tomorrow. Adjust the transition probabilities and run the simulation to see long-term behavior.
Transition Matrix T
Simulation
Day: 0Long-Run Convergence
Transition Matrix
We can represent all transition probabilities in a Stochastic Matrix . Each row must sum to (you have to go somewhere).
Computing Future States
If is the initial state distribution (row vector), then after steps:
Matrix exponentiation lets us "fast-forward" into the future without simulating each step.
Stationary Distribution
If you run the simulation long enough, the probability of being in each state settles down to a constant value . This is the Stationary Distribution (or Equilibrium Distribution).
This equation says is a Left Eigenvector of the transition matrix with eigenvalue . Finding is an eigenvalue problem!
Weather Example
Solving for our weather chain gives . In the long run, it is Sunny 67% of the time, Rainy 33%, regardless of whether we started on a Sunny or Rainy day.
Case Study: Bulb Inventory Management
The Scenario
A warehouse stocks light bulbs. Each week, demand is random: 0, 1, 2, or 3 units with probabilities 0.1, 0.4, 0.3, 0.2. If inventory hits 0, an order of 5 units arrives the next week. State = inventory level (0-5).
The Model
This is a Markov Chain! The state is the current inventory level, and transitions depend only on the current level and the random demand.
We can build a 6x6 transition matrix and compute the stationary distribution to answer: "What's the long-run probability of being out of stock?"
The Answer
Solving for , we find P(Inventory = 0) = 12%. This means the warehouse runs out of bulbs 12% of the time. If that's too high, we can adjust the reorder point or order quantity and re-run the Markov analysis.
Absorbing States
Some states are like black holes: once you enter, you can never leave. . These are Absorbing States.
Gambler's Ruin
You start with $50. Each round, you bet $1 on a fair coin flip. Win = +$1, Lose = -$1. You stop when you reach $100 (Win) or $0 (Bust).
Both $0 and $100 are absorbing states. Markov Chain analysis shows: P(eventually going broke) = 50%. The expected number of rounds until the game ends is 2,500.
Key Questions for Absorbing Chains:
- What's the probability of being absorbed by state A vs state B?
- What's the expected number of steps before absorption?
Hidden Markov Models (HMMs)
What if you cannot observe the Markov chain directly? You only see noisy outputs that depend on the hidden state.
Standard Markov Chain
States are directly observable. Weather = {Sunny, Rainy}. You see the weather.
Hidden Markov Model
States are hidden. You observe emissions. You see people carrying umbrellas and infer the weather.
Bulb Quality Example
A bulb factory machine has hidden states: {Good Calibration, Drifted Calibration}. We cannot directly see the calibration state. But we observe the output: defect rate. High defect rate suggests "Drifted." The Viterbi algorithm finds the most likely sequence of hidden states.
ML Applications
Language Models (N-Grams)
"The cat sat on the..." → Next word depends on previous words. A bigram is a Markov chain where state = previous word. GPT and modern LLMs go beyond Markov (they have memory via attention), but the intuition starts here.
Reinforcement Learning (MDPs)
Modeled as Markov Decision Processes. The agent's next state depends only on current state + action (Markov property). Q-Learning, Policy Gradient, and all RL algorithms build on this.
PageRank (Google Search)
The web is a Markov chain. States = web pages. Transitions = hyperlinks. PageRank is the stationary distribution of this chain (with some damping). High PageRank = important page.
MCMC (Markov Chain Monte Carlo)
Used for Bayesian inference when we cannot compute posteriors analytically. Construct a Markov chain whose stationary distribution IS the posterior. Sample from the chain = sample from the posterior.