Modules
02/06
Graph Theory

Contents

Spectral Graph Theory

How eigenvalues reveal hidden graph structure. The mathematical foundation behind clustering, community detection, and modern GNNs.

Introduction

Graphs are combinatorial objects: nodes and edges. Linear algebra deals with matrices and vector spaces. At first glance, these seem like separate worlds. Spectral graph theory is the bridge between them.

The core insight is simple yet profound: we can represent a graph as a matrix, then analyze that matrix using eigenvalues and eigenvectors. The "spectrum" of a graph (its set of eigenvalues) encodes structural information that is invisible to the naked eye: how clustered the graph is, whether it is connected, how information flows through it.

The Big Idea: Eigenvalues are to graphs what frequencies are to sound waves. A chord sounds different from a single note because it has multiple frequencies. Similarly, a clustered graph "sounds different" from a random graph because its eigenvalue spectrum has a different shape.

Why "Spectral"?

The term "spectral" comes from physics. When you shine white light through a prism, it splits into a spectrum of colors, each color corresponding to a different frequency. In linear algebra, the spectrum of a matrix is its set of eigenvalues.

Just as the color spectrum tells you about light's composition, a matrix's spectrum tells you about its fundamental properties. For graphs, this means the eigenvalue spectrum reveals:

Connectivity

How many connected components? Is the graph well-connected or fragile?

Clustering

Are there natural communities? How separable are they?

Expansion

How quickly does information spread? Are there bottlenecks?

Bipartiteness

Can nodes be split into two groups with edges only between groups?

The Graph Laplacian

While the adjacency matrix is the most obvious way to represent a graph, the Graph Laplacian is far more useful for spectral analysis. It is defined as:

L=DAL = D - A
L

Laplacian matrix (V × V)

D

Degree matrix (diagonal)

A

Adjacency matrix

Constructing the Laplacian

Step 1: Adjacency Matrix A

A[i][j] = 1 if edge (i, j) exists, 0 otherwise. For undirected graphs, A is symmetric.

Step 2: Degree Matrix D

D is diagonal: D[i][i] = degree of node i. All off-diagonal entries are 0.

Step 3: Laplacian L = D - A

Diagonal entries: L[i][i] = degree of node i. Off-diagonal: L[i][j] = -1 if edge exists, 0 otherwise.

Example

Graph: 0 -- 1 -- 2

Adjacency A:        Degree D:           Laplacian L = D - A:
[0, 1, 0]           [1, 0, 0]           [ 1, -1,  0]
[1, 0, 1]           [0, 2, 0]           [-1,  2, -1]
[0, 1, 0]           [0, 0, 1]           [ 0, -1,  1]

Notice: each row of L sums to zero. This is always true and is the key property that makes the Laplacian special.

Why "Laplacian"?

In continuous domains, the Laplacian operator 2f=2fx2+2fy2\nabla^2 f = \frac{\partial^2 f}{\partial x^2} + \frac{\partial^2 f}{\partial y^2} measures how a function differs from its neighborhood average.

The graph Laplacian is the discrete analog: it measures how a node's value differs from its neighbors' values. This connection enables us to apply continuous mathematical tools to discrete graph structures.

Quadratic Form & Smoothness

The Laplacian's true power emerges through its quadratic form. For any vector x that assigns values to nodes:

xTLx=(i,j)E(xixj)2x^T L x = \sum_{(i,j) \in E} (x_i - x_j)^2

Sum of squared differences across all edges

This measures smoothness over the graph:

Low Value = Smooth

Connected nodes have similar values. The signal varies slowly across the graph.

High Value = Rough

Connected nodes have very different values. The signal changes abruptly.

Why This Matters for ML

In semi-supervised learning, we add xTLxx^T L x as a regularization term. This encourages predictions to be smooth over the graph: if two nodes are connected, they should have similar labels. GNNs implicitly optimize for graph smoothness.

Interactive: Graph Smoothness

Assign values to nodes and watch how the Laplacian quadratic form measures smoothness. Edge thickness indicates the magnitude of difference between connected nodes.

Laplacian Smoothness

Total Roughness:2.00
1.01.01.0v00.0v10.0v2
Edges glow when connected nodes have different values. This visually represents the Laplacian term (vᵢ - vⱼ)².

Adjust Signal Values

Node 01.0
Node 10.0
Node 20.0
Pro Tip: Set all nodes to the same value (e.g., 1.0). Roughness drops to 0. This is the Lambda_0 eigenvector (constant vector).

Properties of the Laplacian

1

Symmetric & Positive Semi-Definite

All eigenvalues are real and ≥ 0. This is crucial for many algorithms.

2

Rows Sum to Zero

For any node i: L[i][i] - Σⱼ L[i][j] = degree - degree = 0. This means 1 (the all-ones vector) is always an eigenvector with eigenvalue 0.

3

Quadratic Form

For any vector x: xTLx=(i,j)E(xixj)2x^T L x = \sum_{(i,j) \in E} (x_i - x_j)^2. This measures how "smooth" x is over the graph. Neighboring nodes with similar values contribute less.

4

Number of Zero Eigenvalues = Connected Components

A graph with k connected components has exactly k eigenvalues equal to 0. This is how we can "count" components algebraically.

Eigenvalues & Their Meaning

Let the eigenvalues of L be 0=λ1λ2λn0 = \lambda_1 \leq \lambda_2 \leq \cdots \leq \lambda_n. Each eigenvalue tells us something important:

λ₁ = 0 (Always)

The smallest eigenvalue is always 0, with eigenvector 1. This represents the trivial partition where all nodes are in the same cluster.

λ₂ (Algebraic Connectivity / Fiedler Value)

The second smallest eigenvalue is arguably the most important. It measures how well-connected the graph is:

  • λ₂ = 0 means graph is disconnected (multiple components)
  • λ₂ small means graph is "almost disconnected" (has bottlenecks)
  • λ₂ large means graph is well-connected (hard to cut)

λₙ (Largest Eigenvalue)

Related to the maximum degree. For regular graphs (all nodes same degree d): λₙ ≤ 2d. Large λₙ indicates the presence of high-degree "hub" nodes.

Spectral Gap (λ₂)

The gap between λ₁ and λ₂ determines how fast random walks mix. Large gap means fast mixing and good expansion properties. This is why expander graphs (used in cryptography and CS theory) are designed to maximize λ₂.

Interactive: Eigenvalue Spectrum

See how different graph structures produce characteristic eigenvalue patterns. Pay special attention to λ₂!

Eigenvalue Spectrum

Topology Preview

Two dense groups with a weak bridge.

Eigenvalues (λ)Sorted
0
λ₂ Fiedler
1
2
3
4
5
6
7
λ₂ Value
0.08
Connectivity
Weak
Components
1

The Fiedler Vector

The eigenvector corresponding to λ₂ is called the Fiedler vector, named after Miroslav Fiedler who proved its remarkable property: it naturally partitions the graph.

The Magic: If you sort nodes by their Fiedler vector values and cut anywhere, you get a reasonable graph partition. The optimal cut (minimizing edges between partitions) often occurs where the Fiedler vector crosses zero.

Why Does This Work?

Remember the quadratic form: xTLx=(i,j)E(xixj)2x^T L x = \sum_{(i,j) \in E} (x_i - x_j)^2. The Fiedler vector minimizes this subject to being orthogonal to 1 and having unit norm.

In plain terms: the Fiedler vector assigns values to nodes such that connected nodes have similar values, while the overall assignment is as "spread out" as possible. This naturally groups densely connected clusters together.

Simple Partitioning Rule

  • Compute Fiedler vector v
  • Nodes with v[i] < 0 → Cluster 1
  • Nodes with v[i] ≥ 0 → Cluster 2

Interactive: Spectral Clustering

Step through the spectral clustering algorithm to see how matrix operations reveal graph structure.

Spectral Clustering Demo

Clusters:
1
Build k-NN Graph

Connect each point to its k nearest neighbors

2
Compute Laplacian

L = D - A where D is degree matrix

3
Find Eigenvectors

Get k smallest eigenvectors of L

4
K-Means Clustering

Cluster in eigenvector space

The Spectral Clustering Algorithm

Spectral clustering extends the Fiedler vector idea to k clusters:

Step 1: Build the Laplacian

Compute L = D - A (or use a normalized variant for better results).

Step 2: Compute Eigenvectors

Find the k smallest eigenvectors of L: v₁, v₂, ..., vₖ.

Step 3: Form the Embedding Matrix

Create an n × k matrix U where row i is [v₁[i], v₂[i], ..., vₖ[i]]. Each node is now a point in k-dimensional space.

Step 4: Run k-Means

Apply standard k-means clustering to the rows of U. The cluster labels transfer back to the original nodes.

Why k-Means Works Here: The spectral embedding "untangles" the graph structure. In the original graph, clusters may be interleaved in complex ways. In the spectral embedding, clusters become well-separated Gaussian-like blobs that k-means can easily identify.

Cheeger's Inequality

Cheeger's inequality is the theoretical backbone of spectral clustering. It relates the algebraic connectivity λ₂ to the Cheeger constant h(G), which measures how "cuttable" the graph is.

Cheeger Constant

h(G)=minSedges between S and Sˉmin(S,Sˉ)h(G) = \min_{S} \frac{|\text{edges between } S \text{ and } \bar{S}|}{\min(|S|, |\bar{S}|)}

The minimum over all ways to cut the graph into two pieces, normalized by the size of the smaller piece. Small h(G) means there is a "cheap" cut.

Cheeger's Inequality

λ22h(G)2λ2\frac{\lambda_2}{2} \leq h(G) \leq \sqrt{2 \lambda_2}

λ₂ is a tight approximation of the optimal cut quality. Finding h(G) exactly is NP-hard, but computing λ₂ is polynomial time!

Normalized Laplacians

The unnormalized Laplacian L = D - A has a bias: high-degree nodes dominate. For many applications, normalized variants work better:

Symmetric Normalized

Lsym=D1/2LD1/2=ID1/2AD1/2L_{sym} = D^{-1/2} L D^{-1/2} = I - D^{-1/2} A D^{-1/2}

Eigenvalues in [0, 2]. Used in most spectral clustering implementations.

Random Walk Normalized

Lrw=D1L=ID1AL_{rw} = D^{-1} L = I - D^{-1} A

Connection to random walks on the graph. D⁻¹A is the transition matrix.

The normalized Laplacian treats each node more equally, regardless of degree. This is often what you want: a hub with 1000 connections should not dominate over a leaf node with 2 connections.

ML Applications

Graph Neural Networks (GNNs)

GNNs propagate information using the normalized adjacency matrix. The spectral perspective explains why: GNNs are essentially applying low-pass filters on the graph spectrum.

H(l+1)=σ(D~1/2A~D~1/2H(l)W)H^{(l+1)} = \sigma(\tilde{D}^{-1/2} \tilde{A} \tilde{D}^{-1/2} H^{(l)} W)

The multiplication by the normalized adjacency smooths features over neighbors, exactly what eigenvectors of L do.

Community Detection

Social networks, biological networks, and citation graphs all have community structure. Spectral methods find these communities by looking at the "gaps" in the eigenvalue spectrum.

If there are k natural communities, you will typically see k eigenvalues close to 0, then a gap, then the rest.

Image Segmentation

Treat pixels as nodes, with edges weighted by similarity (color, texture). Spectral clustering groups visually coherent regions together.

The normalized cuts algorithm (Shi & Malik, 2000) is a classic spectral method for image segmentation.

Laplacian Eigenmaps

A dimensionality reduction technique similar to PCA. Embed high-dimensional data in a low-dimensional space such that nearby points in the original space remain nearby.

The embedding coordinates are simply the bottom eigenvectors of the graph Laplacian built from the data's nearest-neighbor graph.