What is a Markov chain and what is the Markov property?

A Markov chain is a mathematical model describing a sequence of possible events where the probability of each event depends only on the state in the previous step. The Markov property formalizes this as P(X_{t+1} | X_t, X_{t-1}, ..., X_0) = P(X_{t+1} | X_t) — the future is independent of the past, given the present. This memoryless property makes Markov chains tractable and widely applicable in machine learning, statistics, and operations research.

What is the transition matrix in a Markov chain?

The transition matrix (also called the stochastic matrix) T encodes all transition probabilities in a Markov chain, where entry T_{ij} gives the probability of moving from state i to state j. Each row must sum to 1 because the chain must transition to some state. Given an initial state distribution π_0, the distribution after k steps is computed as π_k = π_0 T^k, enabling fast-forward prediction without simulating each step individually.

What is the stationary distribution of a Markov chain?

The stationary distribution π is a probability distribution over states that satisfies πT = π, meaning once the chain reaches this distribution it stays there. It represents the long-run fraction of time spent in each state for an ergodic (irreducible and aperiodic) chain. Finding the stationary distribution is equivalent to solving a left eigenvector problem for the transition matrix with eigenvalue 1.

How are Markov chains used in reinforcement learning?

Reinforcement learning environments are formally modeled as Markov Decision Processes (MDPs), which extend Markov chains by adding actions and rewards. The Markov property ensures that the agent's next state depends only on the current state and action, not on the full history. This assumption underpins all major RL algorithms including Q-Learning, Policy Gradient methods, and actor-critic approaches.

What is the difference between a Markov chain and a Hidden Markov Model?

In a standard Markov chain the states are directly observable, so you always know which state you are in. In a Hidden Markov Model (HMM) the states are latent and cannot be observed directly; instead you observe emissions whose distribution depends on the hidden state. HMMs add an emission probability matrix to the Markov chain framework, and algorithms such as Viterbi and Baum-Welch are used to infer hidden states and learn model parameters from observations.

Markov Chains: The Math of What Happens Next

Memoryless Systems

Imagine a drunk person walking on a number line. Their next step (left or right) depends only on where they are right now, not on how they got there. They do not remember their path. This is a Random Walk, and it is a special case of a Markov Chain.

Markov Chains model systems that transition from one state to another probabilistically, where the future depends only on the present, not the past. This "memoryless" property makes them mathematically tractable and surprisingly powerful.

The Markov Property

P(X_{t+1} | X_t, X_{t-1}, ..., X_0) = P(X_{t+1} | X_t)

"The future is independent of the past, given the present."

A Markov Chain is fully defined by:

State Space: The set of all possible states (e.g., {Sunny, Rainy} or {0, 1, 2, ..., N}).
Transition Probabilities: The probability of moving from state $i$ to state $j$ , denoted $P_{ij}$ .
Initial Distribution: The probability of starting in each state.

Interactive Simulator

A simple weather model. If it's Sunny today, there's an 80% chance it's Sunny tomorrow. Adjust the transition probabilities and run the simulation to see long-term behavior.

Transition Matrix T

To Sunny

To Rainy

From Sunny

0.80

0.20

From Rainy

0.40

0.60

Simulation

Day: 0

Specific TrajectorySunny ☀️

Visualizing 2-State Markov Chain

Long-Run Convergence

Actual

Theory (π)

Sunny Days100.0%

From SimulationTarget: 66.7%

Rainy Days0.0%

Transition Matrix

We can represent all transition probabilities in a Stochastic Matrix $T$ . Each row must sum to $1$ (you have to go somewhere).

T = \begin{bmatrix} P(S \to S) & P(S \to R) \\ P(R \to S) & P(R \to R) \end{bmatrix} = \begin{bmatrix} 0.8 & 0.2 \\ 0.4 & 0.6 \end{bmatrix}

Computing Future States

If $\pi_0$ is the initial state distribution (row vector), then after $k$ steps:

\pi_k = \pi_0 T^k

Matrix exponentiation lets us "fast-forward" into the future without simulating each step.

Stationary Distribution

If you run the simulation long enough, the probability of being in each state settles down to a constant value $\pi$ . This is the Stationary Distribution (or Equilibrium Distribution).

\pi T = \pi

with the constraint

\sum \pi_i = 1

This equation says $\pi$ is a Left Eigenvector of the transition matrix with eigenvalue $1$ . Finding $\pi$ is an eigenvalue problem!

Weather Example

Solving $\pi T = \pi$ for our weather chain gives $\pi = [2/3, 1/3]$ . In the long run, it is Sunny 67% of the time, Rainy 33%, regardless of whether we started on a Sunny or Rainy day.

Case Study: Bulb Inventory Management

The Scenario

A warehouse stocks light bulbs. Each week, demand is random: 0, 1, 2, or 3 units with probabilities 0.1, 0.4, 0.3, 0.2. If inventory hits 0, an order of 5 units arrives the next week. State = inventory level (0-5).

The Model

This is a Markov Chain! The state is the current inventory level, and transitions depend only on the current level and the random demand.

We can build a 6x6 transition matrix and compute the stationary distribution to answer: "What's the long-run probability of being out of stock?"

The Answer

Solving for $\pi$ , we find P(Inventory = 0) = 12%. This means the warehouse runs out of bulbs 12% of the time. If that's too high, we can adjust the reorder point or order quantity and re-run the Markov analysis.

Absorbing States

Some states are like black holes: once you enter, you can never leave. $P(i \to i) = 1$ . These are Absorbing States.

Gambler's Ruin

You start with $50. Each round, you bet $1 on a fair coin flip. Win = +$1, Lose = -$1. You stop when you reach $100 (Win) or $0 (Bust).

Both $0 and $100 are absorbing states. Markov Chain analysis shows: P(eventually going broke) = 50%. The expected number of rounds until the game ends is 2,500.

Key Questions for Absorbing Chains:

What's the probability of being absorbed by state A vs state B?
What's the expected number of steps before absorption?

Hidden Markov Models (HMMs)

What if you cannot observe the Markov chain directly? You only see noisy outputs that depend on the hidden state.

Standard Markov Chain

States are directly observable. Weather = {Sunny, Rainy}. You see the weather.

Hidden Markov Model

States are hidden. You observe emissions. You see people carrying umbrellas and infer the weather.

Bulb Quality Example

A bulb factory machine has hidden states: {Good Calibration, Drifted Calibration}. We cannot directly see the calibration state. But we observe the output: defect rate. High defect rate suggests "Drifted." The Viterbi algorithm finds the most likely sequence of hidden states.

ML Applications

Language Models (N-Grams)

"The cat sat on the..." → Next word depends on previous words. A bigram is a Markov chain where state = previous word. GPT and modern LLMs go beyond Markov (they have memory via attention), but the intuition starts here.

Reinforcement Learning (MDPs)

Modeled as Markov Decision Processes. The agent's next state depends only on current state + action (Markov property). Q-Learning, Policy Gradient, and all RL algorithms build on this.

PageRank (Google Search)

The web is a Markov chain. States = web pages. Transitions = hyperlinks. PageRank is the stationary distribution of this chain (with some damping). High PageRank = important page.

MCMC (Markov Chain Monte Carlo)

Used for Bayesian inference when we cannot compute posteriors analytically. Construct a Markov chain whose stationary distribution IS the posterior. Sample from the chain = sample from the posterior.

Contents