Why Decompose Matrices?
Solving directly by computing is expensive () and numerically unstable. Matrix decompositions factor into simpler matrices that make solving systems faster and more stable.
LU Decomposition
Factor into Lower × Upper triangular. Good for repeated solves with same A, different b. Analogy: Gaussian Elimination.
QR Decomposition
Factor into Orthogonal × Upper triangular. Numerically stable, ideal for least squares. Analogy: Gram-Schmidt.
LU Decomposition
Factor a square matrix into a product of Lower and Upper triangular matrices. It is essentially recording the steps of Gaussian Elimination.
Solving with LU
- Factor once: (O(n³))
- For each b, solve (forward substitution, O(n²))
- Then solve (back substitution, O(n²))
Partial Pivoting (PA = LU)
In practice, we permute rows to avoid division by small numbers. This gives where is a permutation matrix.
Interactive Simulator
Step through the LU and QR decomposition process. Watch how matrices are factored at each step.
Matrix Decomposition
Factorizing A into simpler components.
Setup
Start with A. Our goal is to transform A into Upper Triangular form (U) using row operations, while recording the multipliers in L.
QR Decomposition
Factor any matrix (even rectangular) into an Orthogonal matrix times an Upper triangular matrix.
Why QR is Stable
Orthogonal matrices preserve lengths: . This means errors do not amplify during computation (condition number of is 1).
Gram-Schmidt Process
The algorithm to construct is called Gram-Schmidt. It iteratively subtracts projections to force orthogonality.
1. Define Vectors
Start with two linearly independent vectors.
Initial Vectors
Current Operation
- , then
- , then
- , then
Modified Gram-Schmidt is more numerically stable. Modern libraries use Householder reflections (a sequence of mirrors) instead of projections to compute QR.
Case Study: Bulb Characteristic Fitting
The Problem
You have voltage vs brightness data for a bulb. Fit a polynomial: brightness = a₀ + a₁V + a₂V². This is an overdetermined system (more data points than unknowns).
Using QR for Least Squares
- Build Vandermonde matrix:
- Compute
- Solve by back substitution (since R is triangular, this is easy)
Why Not Normal Equations?
For high degree polynomials, becomes ill-conditioned. QR avoids forming this product, preserving numerical stability.
LU vs QR vs SVD
| LU | QR | SVD | |
|---|---|---|---|
| Matrix Shape | Square only | Any shape | Any shape |
| Speed | Fastest | Medium | Slowest |
| Stability | Needs pivoting | Good | Best (Total Least Squares) |
| Best For | Linear solves (Ax=b) | Least squares | Rank, compression |
ML Applications
Cholesky (LLᵀ)
For symmetric positive definite matrices (like covariance matrices), Cholesky is 2x faster than LU. Used in Gaussian Processes for sampling.
Eigenvalue Algorithms
The QR Algorithm for eigenvalues repeatedly computes QR decompositions. This is the standard way numpy.linalg.eig works for non-symmetric matrices.
Backpropagation Efficiency
Computing gradients through matrix inverses involves solving linear systems, which typically uses pre-computed LU factors. Frameworks like JAX exploit this for efficiency.
Randomized Linear Algebra
Large scale ML uses randomized QR (sampling columns) to get approximate decompositions. This is much faster than full SVD for massive recommendation system matrices.