What is a random walk on a graph and how does it work?

A random walk on a graph is a stochastic process where a walker starts at some node and, at each step, moves to one of the current node's neighbors chosen uniformly at random. Mathematically, the walk is captured by a transition matrix P where P[i][j] = 1/degree(i) if an edge (i,j) exists, and 0 otherwise. After many steps the walker's position distribution converges to a stationary distribution that depends only on graph structure, not the starting node.

How does PageRank use random walks?

PageRank models a 'random surfer' who, at each step, either follows a random outgoing hyperlink with probability α (typically 0.85) or teleports to a uniformly random page with probability 1−α. The PageRank score of each page is its probability in the stationary distribution of this modified random walk. Teleportation ensures the Markov chain is irreducible and aperiodic, guaranteeing a unique stationary distribution even for graphs with dangling nodes or disconnected components.

How are random walks used to learn node embeddings in node2vec?

node2vec (Grover & Leskovec, 2016) generates node embeddings by running biased random walks controlled by two hyperparameters p and q. The return parameter p governs the likelihood of revisiting a previous node (favoring local, BFS-like exploration), while q controls the tendency to explore outward to distant nodes (favoring DFS-like exploration). The resulting walk sequences are treated as sentences, nodes as words, and a Skip-gram (word2vec) model is trained on this corpus to produce dense node embeddings that capture community structure or structural roles depending on p and q settings.

What is the relationship between random walks and Markov chains?

A random walk on a graph is a concrete instance of a discrete-time Markov chain: the next node visited depends only on the current node, satisfying the Markov property. The graph's transition matrix P is precisely the Markov chain's one-step transition matrix. Key Markov chain concepts such as irreducibility (graph connectivity), aperiodicity (absence of periodic oscillations), reversibility (detailed balance), and ergodicity directly govern the convergence behaviour of random walks and determine whether a unique stationary distribution exists.

Random Walks, Markov Chains & PageRank

Introduction

Imagine placing a particle on a node in a graph and letting it wander randomly from node to node, following edges. Where will it end up after many steps? Which nodes will it visit most frequently?

This simple idea, random walks on graphs, turns out to be extraordinarily powerful. It is the foundation of Google's PageRank algorithm, modern graph embedding methods like node2vec and DeepWalk, sampling algorithms for large graphs, and the theory of Markov chains.

The Deep Connection: Random walks provide a probabilistic view of graphs. Instead of asking "what is the shortest path?", we ask "how likely is the walker to reach this node?" This perspective reveals global structure that local analysis misses.

What is a Random Walk?

A random walk on a graph is a sequence of nodes where each step moves to a randomly chosen neighbor of the current node.

Formal Definition

Given a graph G = (V, E) and a starting node v₀:

At step t, the walker is at node vₜ
Choose a neighbor u of vₜ uniformly at random
Move to u: vₜ₊₁ = u
Repeat

Types of Random Walks

Simple Random Walk

Uniform probability to all neighbors: P(u|v) = 1/degree(v). Most common.

Lazy Random Walk

50% chance to stay, 50% to move. Ensures aperiodicity for convergence.

Weighted Random Walk

Transition probability proportional to edge weight. Models strength of connections.

Biased Random Walk

Non-uniform neighbor selection. Used in node2vec (BFS vs DFS bias).

The Transition Matrix

Random walks are elegantly captured by a transition matrix P, where P[i][j] is the probability of moving from node i to node j in one step.

P = D^{-1} A

D⁻¹

Inverse degree matrix (row normalization)

Adjacency matrix

Example: Triangle graph (0-1-2-0)

Adjacency A:        Degree D:           Transition P = D⁻¹A:
[0, 1, 1]           [2, 0, 0]           [0,   0.5, 0.5]
[1, 0, 1]           [0, 2, 0]           [0.5, 0,   0.5]
[1, 1, 0]           [0, 0, 2]           [0.5, 0.5, 0  ]

Each row sums to 1 (probability distribution over next nodes)

Multi-Step Transitions

The probability of being at node j after k steps starting from node i is given by the (i,j) entry of P^k:

P(X_k = j | X_0 = i) = [P^k]_{ij}

Matrix powers encode multi-hop reachability probabilities.

Interactive: Random Walk

Watch a random walker traverse the graph. Notice how visit frequencies converge to the stationary distribution over time.

Random Walk Simulation

Watch a random walker traverse the graph. Over time, visit frequencies converge to the stationary distribution.

Walk Type

Graph

Statistics

Total Steps: 0

Walk History (last 20):

Empirical Distribution (converges to stationary):

100.0%

0.0%

Uniform walk: At each step, pick a random neighbor with equal probability. Stationary distribution π(v) ∝ degree(v).

Markov Chains Connection

A random walk on a graph is a specific instance of a Markov chain: a stochastic process where the future depends only on the present, not the past.

The Markov Property

P(X_{t+1} = j | X_t = i, X_{t-1}, ..., X_0) = P(X_{t+1} = j | X_t = i)

"Where you go next depends only on where you are, not how you got there."

Key Properties of Markov Chains

These four properties determine whether a random walk converges to a well-defined stationary distribution:

1. Irreducibility

\forall i, j: \exists n \text{ such that } P^n(i \to j) > 0

A Markov chain is irreducible if you can eventually reach any state from any other state. In graph terms: the graph must be connected.

Why it matters: Without irreducibility, the chain might get "stuck" in a subset of states. Different starting points would lead to different limiting distributions, meaning there is no unique stationary distribution.

Example: A graph with two disconnected components has a reducible random walk. The walker is forever trapped in whichever component it started.

2. Aperiodicity

\gcd\{n : P^n(i \to i) > 0\} = 1

A state is aperiodic if the GCD of all possible return times is 1. The chain is aperiodic if all states are aperiodic.

The problem with periodicity: In a bipartite graph (see Spectral Graph Theory), you can only return to your starting node in an even number of steps (period = 2). The distribution oscillates forever instead of converging.

The fix - Lazy random walks: With probability 0.5, stay at the current node instead of moving. This adds self-loops, making return in 1 step possible (GCD becomes 1).

P_{lazy} = \frac{1}{2}I + \frac{1}{2}P

3. Reversibility (Detailed Balance)

\pi(i) \cdot P(i \to j) = \pi(j) \cdot P(j \to i)

A chain is reversible if, at equilibrium, the flow of probability from i to j equals the flow from j to i. This is called the detailed balance condition.

Intuition: If you filmed the random walk at equilibrium and played it backwards, it would look statistically identical. You cannot tell which direction is "forward."

Undirected graphs: Random walks on undirected graphs are always reversible. For a simple random walk: $\pi(i) = \frac{d_i}{2|E|}$ (proportional to degree).

ML connection: MCMC methods like Metropolis-Hastings construct reversible chains to sample from complex distributions.

4. Ergodicity

\text{Ergodic} = \text{Irreducible} + \text{Aperiodic}

An ergodic Markov chain is both irreducible and aperiodic. This is the magic combination that guarantees convergence.

Fundamental Theorem of Markov Chains:

If a finite Markov chain is ergodic, then:

A unique stationary distribution $\pi$ exists
From any starting distribution, the chain converges to $\pi$
Time averages equal ensemble averages (ergodic theorem)

PageRank insight: The teleportation factor in PageRank ensures ergodicity. Even if the web graph has dangling nodes or disconnected components, the random surfer can always teleport, making the chain irreducible and aperiodic.

Stationary Distribution

After many steps, a random walk "forgets" where it started and converges to a stationary distribution π: a probability distribution over nodes that doesn't change under the random walk.

\pi P = \pi

π is a left eigenvector of P with eigenvalue 1.

For Undirected Graphs

The stationary distribution has a beautiful closed form:

\pi(v) = \frac{\text{degree}(v)}{2|E|}

High-degree nodes are visited more often. Proportional to degree.

Intuition: A node with degree 10 has 10 "doors" leading to it, while a node with degree 2 has only 2. The walker is 5× more likely to enter the high-degree node.

Convergence & Mixing Time

How quickly does the random walk converge to the stationary distribution? This is measured by the mixing time.

Mixing Time Definition

t_{mix}(\epsilon) = \min\{t : \|P^t(x, \cdot) - \pi\|_{TV} \leq \epsilon, \forall x\}

Time until the distribution is ε-close to stationary, regardless of starting point.

The Spectral Gap

Mixing time is controlled by the spectral gap: the difference between the two largest eigenvalues of P.

\gamma = 1 - \lambda_2 \quad \text{(spectral gap)}

Large gap: Fast mixing, walk "forgets" quickly
Small gap: Slow mixing, walk gets "stuck" in clusters

This connects directly to spectral graph theory: the Laplacian eigenvalue λ₂ controls both graph connectivity and random walk mixing.

Interactive: Mixing Time

Compare how different graph structures affect convergence speed. Bottlenecks dramatically slow mixing!

Mixing Time Explorer

Watch how graph topology affects convergence to stationary distribution

Network State at t = 0

Show Probability

Click a node to set it as start position

Total Variation Distance to $\pi$

Spectral Gap

0.50

\gamma = 1 - \lambda_2

TV Distance

NaN

\|\mu_t - \pi\|_{TV}

Status

Mixing

t = 0

Well-Connected: Expander graph with high conductance. All nodes are densely connected, leading to rapid information spread.

Bound:

t_{mix} \leq \frac{1}{\gamma} \ln(n/\epsilon)

Current vs Stationary Distribution

Solid = CurrentFaded = Stationary π

PageRank

Google's PageRank (Brin & Page, 1998) is a random walk with a twist: at each step, with probability α (typically 0.85), follow a random link; with probability 1-α, "teleport" to a random page.

PR = \alpha \cdot P^T \cdot PR + \frac{1-\alpha}{n} \cdot \mathbf{1}

PageRank vector is the stationary distribution of this modified random walk.

Why Teleportation?

Handles Dead Ends

Pages with no outlinks ("dangling nodes") would trap the walker. Teleportation provides escape.

Ensures Irreducibility

Even if graph is disconnected, teleportation connects all nodes. Guarantees unique solution.

Models "Bored User"

A real user sometimes gives up clicking links and types a new URL. The 1-α term models this.

Fast Convergence

Teleportation increases spectral gap, speeding up power iteration convergence.

Interactive: PageRank

Watch PageRank converge through iterations. Node size reflects importance!

PageRank Simulator

Simulate the "random surfer" model. Damping factor α = 0.85.

Iter: 0/20

Rank Leaderboard

H114.3%

a14.3%

b14.3%

H214.3%

x14.3%

y14.3%

z14.3%

node2vec & Graph Embeddings

node2vec (Grover & Leskovec, 2016) uses biased random walks to generate node embeddings. The key insight: by controlling the walk's exploration behavior, we can capture different notions of node similarity.

The node2vec Walk

After moving from t to v, the probability of moving to x depends on x's relationship to t:

1/p

If x = t (return)

If x ~ t (BFS-like)

1/q

If x ≁ t (DFS-like)

Low p (return often)

Encourages local exploration. Captures structural equivalence (similar roles).

Low q (go far)

Encourages outward exploration. Captures homophily (community membership).

Interactive: Adjust p and q to control the walker's bias

node2vec Simulator

Biased Random Walks

Walk Length:1

Current Bias:Uniform

Explore Parameters

Parameter p (Return)1.0

Low p = High return prob (stay local). High p = Explore new nodes.

Parameter q (In-Out)1.0

Low q = Go outward (DFS). High q = Stay close (BFS).

The Algorithm

For each node, run multiple biased random walks of fixed length
Treat walks as "sentences" and nodes as "words"
Train word2vec (Skip-gram) on the walk corpus
Node embeddings emerge from the word2vec training

ML Applications

Graph Sampling

When graphs are too large to fit in memory, random walks provide unbiased samples. Metropolis-Hastings walks give uniform node samples.

Used by: Facebook, Twitter for large-scale graph analytics

Semi-Supervised Learning

Label propagation can be viewed as running random walks from labeled nodes and spreading labels along the way.

The Personalized PageRank is a popular choice for label propagation.

GNN Message Passing

GNNs can be viewed as differentiable random walks. The transition matrix P corresponds to the normalized adjacency used in GCN.

Random walk positional encodings are used in Graph Transformers.

Recommendation Systems

Pixie (Pinterest) uses random walks with restarts on the user-item graph for real-time recommendations.

Scales to billions of nodes, serves 300M users.

Contents