Modules
03/06
Graph Theory

Contents

Random Walks & Markov Chains

From Google's PageRank to modern graph embeddings: how random exploration reveals graph structure.

Introduction

Imagine placing a particle on a node in a graph and letting it wander randomly from node to node, following edges. Where will it end up after many steps? Which nodes will it visit most frequently?

This simple idea, random walks on graphs, turns out to be extraordinarily powerful. It is the foundation of Google's PageRank algorithm, modern graph embedding methods like node2vec and DeepWalk, sampling algorithms for large graphs, and the theory of Markov chains.

The Deep Connection: Random walks provide a probabilistic view of graphs. Instead of asking "what is the shortest path?", we ask "how likely is the walker to reach this node?" This perspective reveals global structure that local analysis misses.

What is a Random Walk?

A random walk on a graph is a sequence of nodes where each step moves to a randomly chosen neighbor of the current node.

Formal Definition

Given a graph G = (V, E) and a starting node v₀:

  1. At step t, the walker is at node vₜ
  2. Choose a neighbor u of vₜ uniformly at random
  3. Move to u: vₜ₊₁ = u
  4. Repeat

Types of Random Walks

Simple Random Walk

Uniform probability to all neighbors: P(u|v) = 1/degree(v). Most common.

Lazy Random Walk

50% chance to stay, 50% to move. Ensures aperiodicity for convergence.

Weighted Random Walk

Transition probability proportional to edge weight. Models strength of connections.

Biased Random Walk

Non-uniform neighbor selection. Used in node2vec (BFS vs DFS bias).

The Transition Matrix

Random walks are elegantly captured by a transition matrix P, where P[i][j] is the probability of moving from node i to node j in one step.

P=D1AP = D^{-1} A
D⁻¹

Inverse degree matrix (row normalization)

A

Adjacency matrix

Example: Triangle graph (0-1-2-0)

Adjacency A:        Degree D:           Transition P = D⁻¹A:
[0, 1, 1]           [2, 0, 0]           [0,   0.5, 0.5]
[1, 0, 1]           [0, 2, 0]           [0.5, 0,   0.5]
[1, 1, 0]           [0, 0, 2]           [0.5, 0.5, 0  ]

Each row sums to 1 (probability distribution over next nodes)

Multi-Step Transitions

The probability of being at node j after k steps starting from node i is given by the (i,j) entry of P^k:

P(Xk=jX0=i)=[Pk]ijP(X_k = j | X_0 = i) = [P^k]_{ij}

Matrix powers encode multi-hop reachability probabilities.

Interactive: Random Walk

Watch a random walker traverse the graph. Notice how visit frequencies converge to the stationary distribution over time.

Random Walk Simulation

Watch a random walker traverse the graph. Over time, visit frequencies converge to the stationary distribution.

Graph

A1B0C0D0E0F0

Statistics

Total Steps: 0

Walk History (last 20):

A

Empirical Distribution (converges to stationary):

A
100.0%
B
0.0%
C
0.0%
D
0.0%
E
0.0%
F
0.0%

Uniform walk: At each step, pick a random neighbor with equal probability. Stationary distribution π(v) ∝ degree(v).

Markov Chains Connection

A random walk on a graph is a specific instance of a Markov chain: a stochastic process where the future depends only on the present, not the past.

The Markov Property

P(Xt+1=jXt=i,Xt1,...,X0)=P(Xt+1=jXt=i)P(X_{t+1} = j | X_t = i, X_{t-1}, ..., X_0) = P(X_{t+1} = j | X_t = i)

"Where you go next depends only on where you are, not how you got there."

Key Properties of Markov Chains

These four properties determine whether a random walk converges to a well-defined stationary distribution:

1. Irreducibility

i,j:n such that Pn(ij)>0\forall i, j: \exists n \text{ such that } P^n(i \to j) > 0

A Markov chain is irreducible if you can eventually reach any state from any other state. In graph terms: the graph must be connected.

Why it matters: Without irreducibility, the chain might get "stuck" in a subset of states. Different starting points would lead to different limiting distributions, meaning there is no unique stationary distribution.

Example: A graph with two disconnected components has a reducible random walk. The walker is forever trapped in whichever component it started.

2. Aperiodicity

gcd{n:Pn(ii)>0}=1\gcd\{n : P^n(i \to i) > 0\} = 1

A state is aperiodic if the GCD of all possible return times is 1. The chain is aperiodic if all states are aperiodic.

The problem with periodicity: In a bipartite graph (see Spectral Graph Theory), you can only return to your starting node in an even number of steps (period = 2). The distribution oscillates forever instead of converging.

The fix - Lazy random walks: With probability 0.5, stay at the current node instead of moving. This adds self-loops, making return in 1 step possible (GCD becomes 1).

Plazy=12I+12PP_{lazy} = \frac{1}{2}I + \frac{1}{2}P

3. Reversibility (Detailed Balance)

π(i)P(ij)=π(j)P(ji)\pi(i) \cdot P(i \to j) = \pi(j) \cdot P(j \to i)

A chain is reversible if, at equilibrium, the flow of probability from i to j equals the flow from j to i. This is called the detailed balance condition.

Intuition: If you filmed the random walk at equilibrium and played it backwards, it would look statistically identical. You cannot tell which direction is "forward."

Undirected graphs: Random walks on undirected graphs are always reversible. For a simple random walk: π(i)=di2E\pi(i) = \frac{d_i}{2|E|} (proportional to degree).

ML connection: MCMC methods like Metropolis-Hastings construct reversible chains to sample from complex distributions.

4. Ergodicity

Ergodic=Irreducible+Aperiodic\text{Ergodic} = \text{Irreducible} + \text{Aperiodic}

An ergodic Markov chain is both irreducible and aperiodic. This is the magic combination that guarantees convergence.

Fundamental Theorem of Markov Chains:

If a finite Markov chain is ergodic, then:

  • A unique stationary distribution π\pi exists
  • From any starting distribution, the chain converges to π\pi
  • Time averages equal ensemble averages (ergodic theorem)

PageRank insight: The teleportation factor in PageRank ensures ergodicity. Even if the web graph has dangling nodes or disconnected components, the random surfer can always teleport, making the chain irreducible and aperiodic.

Stationary Distribution

After many steps, a random walk "forgets" where it started and converges to a stationary distribution π: a probability distribution over nodes that doesn't change under the random walk.

πP=π\pi P = \pi

π is a left eigenvector of P with eigenvalue 1.

For Undirected Graphs

The stationary distribution has a beautiful closed form:

π(v)=degree(v)2E\pi(v) = \frac{\text{degree}(v)}{2|E|}

High-degree nodes are visited more often. Proportional to degree.

Intuition: A node with degree 10 has 10 "doors" leading to it, while a node with degree 2 has only 2. The walker is 5× more likely to enter the high-degree node.

Convergence & Mixing Time

How quickly does the random walk converge to the stationary distribution? This is measured by the mixing time.

Mixing Time Definition

tmix(ϵ)=min{t:Pt(x,)πTVϵ,x}t_{mix}(\epsilon) = \min\{t : \|P^t(x, \cdot) - \pi\|_{TV} \leq \epsilon, \forall x\}

Time until the distribution is ε-close to stationary, regardless of starting point.

The Spectral Gap

Mixing time is controlled by the spectral gap: the difference between the two largest eigenvalues of P.

γ=1λ2(spectral gap)\gamma = 1 - \lambda_2 \quad \text{(spectral gap)}
  • Large gap: Fast mixing, walk "forgets" quickly
  • Small gap: Slow mixing, walk gets "stuck" in clusters

This connects directly to spectral graph theory: the Laplacian eigenvalue λ₂ controls both graph connectivity and random walk mixing.

Interactive: Mixing Time

Compare how different graph structures affect convergence speed. Bottlenecks dramatically slow mixing!

Mixing Time Explorer

Network State at t = 0

012345

Click a node to set it as start position

Total Variation Distance to π\pi

1.00.50ε = 0.05t_mix ≈ 5Time Steps (t)
Spectral Gap
0.50
γ=1λ2\gamma = 1 - \lambda_2
TV Distance
NaN
μtπTV\|\mu_t - \pi\|_{TV}
Status
Mixing
t = 0

Well-Connected: Expander graph with high conductance. All nodes are densely connected, leading to rapid information spread.

Bound: tmix1γln(n/ϵ)t_{mix} \leq \frac{1}{\gamma} \ln(n/\epsilon)
Current vs Stationary Distribution
0
1
2
3
4
5
Solid = CurrentFaded = Stationary π

PageRank

Google's PageRank (Brin & Page, 1998) is a random walk with a twist: at each step, with probability α (typically 0.85), follow a random link; with probability 1-α, "teleport" to a random page.

PR=αPTPR+1αn1PR = \alpha \cdot P^T \cdot PR + \frac{1-\alpha}{n} \cdot \mathbf{1}

PageRank vector is the stationary distribution of this modified random walk.

Why Teleportation?

Handles Dead Ends

Pages with no outlinks ("dangling nodes") would trap the walker. Teleportation provides escape.

Ensures Irreducibility

Even if graph is disconnected, teleportation connects all nodes. Guarantees unique solution.

Models "Bored User"

A real user sometimes gives up clicking links and types a new URL. The 1-α term models this.

Fast Convergence

Teleportation increases spectral gap, speeding up power iteration convergence.

Interactive: PageRank

Watch PageRank converge through iterations. Node size reflects importance!

PageRank Simulator

Simulate the "random surfer" model. Damping factor α = 0.85.

Iter: 0/20
H114.3%a14.3%b14.3%H214.3%x14.3%y14.3%z14.3%

Rank Leaderboard

1
H114.3%
2
a14.3%
3
b14.3%
4
H214.3%
5
x14.3%
6
y14.3%
7
z14.3%

node2vec & Graph Embeddings

node2vec (Grover & Leskovec, 2016) uses biased random walks to generate node embeddings. The key insight: by controlling the walk's exploration behavior, we can capture different notions of node similarity.

The node2vec Walk

After moving from t to v, the probability of moving to x depends on x's relationship to t:

1/p

If x = t (return)

1

If x ~ t (BFS-like)

1/q

If x ≁ t (DFS-like)

Low p (return often)

Encourages local exploration. Captures structural equivalence (similar roles).

Low q (go far)

Encourages outward exploration. Captures homophily (community membership).

Interactive: Adjust p and q to control the walker's bias

node2vec Simulator

Biased Random Walks

Start20%L120%L220%L320%D120%D2FarFar+Return (1/p)BFS (1)DFS (1/q)
Walk Length:1
Current Bias:Uniform

Explore Parameters

Parameter p (Return)1.0

Low p = High return prob (stay local). High p = Explore new nodes.

Parameter q (In-Out)1.0

Low q = Go outward (DFS). High q = Stay close (BFS).

The Algorithm

  1. For each node, run multiple biased random walks of fixed length
  2. Treat walks as "sentences" and nodes as "words"
  3. Train word2vec (Skip-gram) on the walk corpus
  4. Node embeddings emerge from the word2vec training

ML Applications

Graph Sampling

When graphs are too large to fit in memory, random walks provide unbiased samples. Metropolis-Hastings walks give uniform node samples.

Used by: Facebook, Twitter for large-scale graph analytics

Semi-Supervised Learning

Label propagation can be viewed as running random walks from labeled nodes and spreading labels along the way.

The Personalized PageRank is a popular choice for label propagation.

GNN Message Passing

GNNs can be viewed as differentiable random walks. The transition matrix P corresponds to the normalized adjacency used in GCN.

Random walk positional encodings are used in Graph Transformers.

Recommendation Systems

Pixie (Pinterest) uses random walks with restarts on the user-item graph for real-time recommendations.

Scales to billions of nodes, serves 300M users.