Introduction
Imagine placing a particle on a node in a graph and letting it wander randomly from node to node, following edges. Where will it end up after many steps? Which nodes will it visit most frequently?
This simple idea, random walks on graphs, turns out to be extraordinarily powerful. It is the foundation of Google's PageRank algorithm, modern graph embedding methods like node2vec and DeepWalk, sampling algorithms for large graphs, and the theory of Markov chains.
The Deep Connection: Random walks provide a probabilistic view of graphs. Instead of asking "what is the shortest path?", we ask "how likely is the walker to reach this node?" This perspective reveals global structure that local analysis misses.
What is a Random Walk?
A random walk on a graph is a sequence of nodes where each step moves to a randomly chosen neighbor of the current node.
Formal Definition
Given a graph G = (V, E) and a starting node v₀:
- At step t, the walker is at node vₜ
- Choose a neighbor u of vₜ uniformly at random
- Move to u: vₜ₊₁ = u
- Repeat
Types of Random Walks
Simple Random Walk
Uniform probability to all neighbors: P(u|v) = 1/degree(v). Most common.
Lazy Random Walk
50% chance to stay, 50% to move. Ensures aperiodicity for convergence.
Weighted Random Walk
Transition probability proportional to edge weight. Models strength of connections.
Biased Random Walk
Non-uniform neighbor selection. Used in node2vec (BFS vs DFS bias).
The Transition Matrix
Random walks are elegantly captured by a transition matrix P, where P[i][j] is the probability of moving from node i to node j in one step.
Inverse degree matrix (row normalization)
Adjacency matrix
Example: Triangle graph (0-1-2-0) Adjacency A: Degree D: Transition P = D⁻¹A: [0, 1, 1] [2, 0, 0] [0, 0.5, 0.5] [1, 0, 1] [0, 2, 0] [0.5, 0, 0.5] [1, 1, 0] [0, 0, 2] [0.5, 0.5, 0 ] Each row sums to 1 (probability distribution over next nodes)
Multi-Step Transitions
The probability of being at node j after k steps starting from node i is given by the (i,j) entry of P^k:
Matrix powers encode multi-hop reachability probabilities.
Interactive: Random Walk
Watch a random walker traverse the graph. Notice how visit frequencies converge to the stationary distribution over time.
Random Walk Simulation
Watch a random walker traverse the graph. Over time, visit frequencies converge to the stationary distribution.
Graph
Statistics
Total Steps: 0
Walk History (last 20):
Empirical Distribution (converges to stationary):
Uniform walk: At each step, pick a random neighbor with equal probability. Stationary distribution π(v) ∝ degree(v).
Markov Chains Connection
A random walk on a graph is a specific instance of a Markov chain: a stochastic process where the future depends only on the present, not the past.
The Markov Property
"Where you go next depends only on where you are, not how you got there."
Key Properties of Markov Chains
These four properties determine whether a random walk converges to a well-defined stationary distribution:
1. Irreducibility
A Markov chain is irreducible if you can eventually reach any state from any other state. In graph terms: the graph must be connected.
Why it matters: Without irreducibility, the chain might get "stuck" in a subset of states. Different starting points would lead to different limiting distributions, meaning there is no unique stationary distribution.
Example: A graph with two disconnected components has a reducible random walk. The walker is forever trapped in whichever component it started.
2. Aperiodicity
A state is aperiodic if the GCD of all possible return times is 1. The chain is aperiodic if all states are aperiodic.
The problem with periodicity: In a bipartite graph (see Spectral Graph Theory), you can only return to your starting node in an even number of steps (period = 2). The distribution oscillates forever instead of converging.
The fix - Lazy random walks: With probability 0.5, stay at the current node instead of moving. This adds self-loops, making return in 1 step possible (GCD becomes 1).
3. Reversibility (Detailed Balance)
A chain is reversible if, at equilibrium, the flow of probability from i to j equals the flow from j to i. This is called the detailed balance condition.
Intuition: If you filmed the random walk at equilibrium and played it backwards, it would look statistically identical. You cannot tell which direction is "forward."
Undirected graphs: Random walks on undirected graphs are always reversible. For a simple random walk: (proportional to degree).
ML connection: MCMC methods like Metropolis-Hastings construct reversible chains to sample from complex distributions.
4. Ergodicity
An ergodic Markov chain is both irreducible and aperiodic. This is the magic combination that guarantees convergence.
Fundamental Theorem of Markov Chains:
If a finite Markov chain is ergodic, then:
- A unique stationary distribution exists
- From any starting distribution, the chain converges to
- Time averages equal ensemble averages (ergodic theorem)
PageRank insight: The teleportation factor in PageRank ensures ergodicity. Even if the web graph has dangling nodes or disconnected components, the random surfer can always teleport, making the chain irreducible and aperiodic.
Stationary Distribution
After many steps, a random walk "forgets" where it started and converges to a stationary distribution π: a probability distribution over nodes that doesn't change under the random walk.
π is a left eigenvector of P with eigenvalue 1.
For Undirected Graphs
The stationary distribution has a beautiful closed form:
High-degree nodes are visited more often. Proportional to degree.
Intuition: A node with degree 10 has 10 "doors" leading to it, while a node with degree 2 has only 2. The walker is 5× more likely to enter the high-degree node.
Convergence & Mixing Time
How quickly does the random walk converge to the stationary distribution? This is measured by the mixing time.
Mixing Time Definition
Time until the distribution is ε-close to stationary, regardless of starting point.
The Spectral Gap
Mixing time is controlled by the spectral gap: the difference between the two largest eigenvalues of P.
- Large gap: Fast mixing, walk "forgets" quickly
- Small gap: Slow mixing, walk gets "stuck" in clusters
This connects directly to spectral graph theory: the Laplacian eigenvalue λ₂ controls both graph connectivity and random walk mixing.
Interactive: Mixing Time
Compare how different graph structures affect convergence speed. Bottlenecks dramatically slow mixing!
Mixing Time Explorer
Network State at t = 0
Click a node to set it as start position
Total Variation Distance to
Well-Connected: Expander graph with high conductance. All nodes are densely connected, leading to rapid information spread.
PageRank
Google's PageRank (Brin & Page, 1998) is a random walk with a twist: at each step, with probability α (typically 0.85), follow a random link; with probability 1-α, "teleport" to a random page.
PageRank vector is the stationary distribution of this modified random walk.
Why Teleportation?
Handles Dead Ends
Pages with no outlinks ("dangling nodes") would trap the walker. Teleportation provides escape.
Ensures Irreducibility
Even if graph is disconnected, teleportation connects all nodes. Guarantees unique solution.
Models "Bored User"
A real user sometimes gives up clicking links and types a new URL. The 1-α term models this.
Fast Convergence
Teleportation increases spectral gap, speeding up power iteration convergence.
Interactive: PageRank
Watch PageRank converge through iterations. Node size reflects importance!
PageRank Simulator
Simulate the "random surfer" model. Damping factor α = 0.85.
Rank Leaderboard
node2vec & Graph Embeddings
node2vec (Grover & Leskovec, 2016) uses biased random walks to generate node embeddings. The key insight: by controlling the walk's exploration behavior, we can capture different notions of node similarity.
The node2vec Walk
After moving from t to v, the probability of moving to x depends on x's relationship to t:
If x = t (return)
If x ~ t (BFS-like)
If x ≁ t (DFS-like)
Low p (return often)
Encourages local exploration. Captures structural equivalence (similar roles).
Low q (go far)
Encourages outward exploration. Captures homophily (community membership).
Interactive: Adjust p and q to control the walker's bias
node2vec Simulator
Biased Random Walks
Explore Parameters
Low p = High return prob (stay local). High p = Explore new nodes.
Low q = Go outward (DFS). High q = Stay close (BFS).
The Algorithm
- For each node, run multiple biased random walks of fixed length
- Treat walks as "sentences" and nodes as "words"
- Train word2vec (Skip-gram) on the walk corpus
- Node embeddings emerge from the word2vec training
ML Applications
Graph Sampling
When graphs are too large to fit in memory, random walks provide unbiased samples. Metropolis-Hastings walks give uniform node samples.
Used by: Facebook, Twitter for large-scale graph analytics
Semi-Supervised Learning
Label propagation can be viewed as running random walks from labeled nodes and spreading labels along the way.
The Personalized PageRank is a popular choice for label propagation.
GNN Message Passing
GNNs can be viewed as differentiable random walks. The transition matrix P corresponds to the normalized adjacency used in GCN.
Random walk positional encodings are used in Graph Transformers.
Recommendation Systems
Pixie (Pinterest) uses random walks with restarts on the user-item graph for real-time recommendations.
Scales to billions of nodes, serves 300M users.