Introduction
Graphs are combinatorial objects: nodes and edges. Linear algebra deals with matrices and vector spaces. At first glance, these seem like separate worlds. Spectral graph theory is the bridge between them.
The core insight is simple yet profound: we can represent a graph as a matrix, then analyze that matrix using eigenvalues and eigenvectors. The "spectrum" of a graph (its set of eigenvalues) encodes structural information that is invisible to the naked eye: how clustered the graph is, whether it is connected, how information flows through it.
The Big Idea: Eigenvalues are to graphs what frequencies are to sound waves. A chord sounds different from a single note because it has multiple frequencies. Similarly, a clustered graph "sounds different" from a random graph because its eigenvalue spectrum has a different shape.
Why "Spectral"?
The term "spectral" comes from physics. When you shine white light through a prism, it splits into a spectrum of colors, each color corresponding to a different frequency. In linear algebra, the spectrum of a matrix is its set of eigenvalues.
Just as the color spectrum tells you about light's composition, a matrix's spectrum tells you about its fundamental properties. For graphs, this means the eigenvalue spectrum reveals:
Connectivity
How many connected components? Is the graph well-connected or fragile?
Clustering
Are there natural communities? How separable are they?
Expansion
How quickly does information spread? Are there bottlenecks?
Bipartiteness
Can nodes be split into two groups with edges only between groups?
The Graph Laplacian
While the adjacency matrix is the most obvious way to represent a graph, the Graph Laplacian is far more useful for spectral analysis. It is defined as:
Laplacian matrix (V × V)
Degree matrix (diagonal)
Adjacency matrix
Constructing the Laplacian
Step 1: Adjacency Matrix A
A[i][j] = 1 if edge (i, j) exists, 0 otherwise. For undirected graphs, A is symmetric.
Step 2: Degree Matrix D
D is diagonal: D[i][i] = degree of node i. All off-diagonal entries are 0.
Step 3: Laplacian L = D - A
Diagonal entries: L[i][i] = degree of node i. Off-diagonal: L[i][j] = -1 if edge exists, 0 otherwise.
Example
Graph: 0 -- 1 -- 2 Adjacency A: Degree D: Laplacian L = D - A: [0, 1, 0] [1, 0, 0] [ 1, -1, 0] [1, 0, 1] [0, 2, 0] [-1, 2, -1] [0, 1, 0] [0, 0, 1] [ 0, -1, 1]
Notice: each row of L sums to zero. This is always true and is the key property that makes the Laplacian special.
Why "Laplacian"?
In continuous domains, the Laplacian operator measures how a function differs from its neighborhood average.
The graph Laplacian is the discrete analog: it measures how a node's value differs from its neighbors' values. This connection enables us to apply continuous mathematical tools to discrete graph structures.
Quadratic Form & Smoothness
The Laplacian's true power emerges through its quadratic form. For any vector x that assigns values to nodes:
Sum of squared differences across all edges
This measures smoothness over the graph:
Low Value = Smooth
Connected nodes have similar values. The signal varies slowly across the graph.
High Value = Rough
Connected nodes have very different values. The signal changes abruptly.
Why This Matters for ML
In semi-supervised learning, we add as a regularization term. This encourages predictions to be smooth over the graph: if two nodes are connected, they should have similar labels. GNNs implicitly optimize for graph smoothness.
Interactive: Graph Smoothness
Assign values to nodes and watch how the Laplacian quadratic form measures smoothness. Edge thickness indicates the magnitude of difference between connected nodes.
Laplacian Smoothness
Adjust Signal Values
Properties of the Laplacian
Symmetric & Positive Semi-Definite
All eigenvalues are real and ≥ 0. This is crucial for many algorithms.
Rows Sum to Zero
For any node i: L[i][i] - Σⱼ L[i][j] = degree - degree = 0. This means 1 (the all-ones vector) is always an eigenvector with eigenvalue 0.
Quadratic Form
For any vector x: . This measures how "smooth" x is over the graph. Neighboring nodes with similar values contribute less.
Number of Zero Eigenvalues = Connected Components
A graph with k connected components has exactly k eigenvalues equal to 0. This is how we can "count" components algebraically.
Eigenvalues & Their Meaning
Let the eigenvalues of L be . Each eigenvalue tells us something important:
λ₁ = 0 (Always)
The smallest eigenvalue is always 0, with eigenvector 1. This represents the trivial partition where all nodes are in the same cluster.
λ₂ (Algebraic Connectivity / Fiedler Value)
The second smallest eigenvalue is arguably the most important. It measures how well-connected the graph is:
- λ₂ = 0 means graph is disconnected (multiple components)
- λ₂ small means graph is "almost disconnected" (has bottlenecks)
- λ₂ large means graph is well-connected (hard to cut)
λₙ (Largest Eigenvalue)
Related to the maximum degree. For regular graphs (all nodes same degree d): λₙ ≤ 2d. Large λₙ indicates the presence of high-degree "hub" nodes.
Spectral Gap (λ₂)
The gap between λ₁ and λ₂ determines how fast random walks mix. Large gap means fast mixing and good expansion properties. This is why expander graphs (used in cryptography and CS theory) are designed to maximize λ₂.
Interactive: Eigenvalue Spectrum
See how different graph structures produce characteristic eigenvalue patterns. Pay special attention to λ₂!
Eigenvalue Spectrum
Two dense groups with a weak bridge.
The Fiedler Vector
The eigenvector corresponding to λ₂ is called the Fiedler vector, named after Miroslav Fiedler who proved its remarkable property: it naturally partitions the graph.
The Magic: If you sort nodes by their Fiedler vector values and cut anywhere, you get a reasonable graph partition. The optimal cut (minimizing edges between partitions) often occurs where the Fiedler vector crosses zero.
Why Does This Work?
Remember the quadratic form: . The Fiedler vector minimizes this subject to being orthogonal to 1 and having unit norm.
In plain terms: the Fiedler vector assigns values to nodes such that connected nodes have similar values, while the overall assignment is as "spread out" as possible. This naturally groups densely connected clusters together.
Simple Partitioning Rule
- Compute Fiedler vector v
- Nodes with v[i] < 0 → Cluster 1
- Nodes with v[i] ≥ 0 → Cluster 2
Interactive: Spectral Clustering
Step through the spectral clustering algorithm to see how matrix operations reveal graph structure.
Spectral Clustering Demo
Connect each point to its k nearest neighbors
L = D - A where D is degree matrix
Get k smallest eigenvectors of L
Cluster in eigenvector space
The Spectral Clustering Algorithm
Spectral clustering extends the Fiedler vector idea to k clusters:
Step 1: Build the Laplacian
Compute L = D - A (or use a normalized variant for better results).
Step 2: Compute Eigenvectors
Find the k smallest eigenvectors of L: v₁, v₂, ..., vₖ.
Step 3: Form the Embedding Matrix
Create an n × k matrix U where row i is [v₁[i], v₂[i], ..., vₖ[i]]. Each node is now a point in k-dimensional space.
Step 4: Run k-Means
Apply standard k-means clustering to the rows of U. The cluster labels transfer back to the original nodes.
Why k-Means Works Here: The spectral embedding "untangles" the graph structure. In the original graph, clusters may be interleaved in complex ways. In the spectral embedding, clusters become well-separated Gaussian-like blobs that k-means can easily identify.
Cheeger's Inequality
Cheeger's inequality is the theoretical backbone of spectral clustering. It relates the algebraic connectivity λ₂ to the Cheeger constant h(G), which measures how "cuttable" the graph is.
Cheeger Constant
The minimum over all ways to cut the graph into two pieces, normalized by the size of the smaller piece. Small h(G) means there is a "cheap" cut.
Cheeger's Inequality
λ₂ is a tight approximation of the optimal cut quality. Finding h(G) exactly is NP-hard, but computing λ₂ is polynomial time!
Normalized Laplacians
The unnormalized Laplacian L = D - A has a bias: high-degree nodes dominate. For many applications, normalized variants work better:
Symmetric Normalized
Eigenvalues in [0, 2]. Used in most spectral clustering implementations.
The normalized Laplacian treats each node more equally, regardless of degree. This is often what you want: a hub with 1000 connections should not dominate over a leaf node with 2 connections.
ML Applications
Graph Neural Networks (GNNs)
GNNs propagate information using the normalized adjacency matrix. The spectral perspective explains why: GNNs are essentially applying low-pass filters on the graph spectrum.
The multiplication by the normalized adjacency smooths features over neighbors, exactly what eigenvectors of L do.
Community Detection
Social networks, biological networks, and citation graphs all have community structure. Spectral methods find these communities by looking at the "gaps" in the eigenvalue spectrum.
If there are k natural communities, you will typically see k eigenvalues close to 0, then a gap, then the rest.
Image Segmentation
Treat pixels as nodes, with edges weighted by similarity (color, texture). Spectral clustering groups visually coherent regions together.
The normalized cuts algorithm (Shi & Malik, 2000) is a classic spectral method for image segmentation.
Laplacian Eigenmaps
A dimensionality reduction technique similar to PCA. Embed high-dimensional data in a low-dimensional space such that nearby points in the original space remain nearby.
The embedding coordinates are simply the bottom eigenvectors of the graph Laplacian built from the data's nearest-neighbor graph.