What is LU decomposition and what is it used for?

LU decomposition factors a square matrix A into a lower triangular matrix L and an upper triangular matrix U, so A = LU. It is used to efficiently solve linear systems Ax = b: factor once in O(n³), then solve each new right-hand side in O(n²) using forward and back substitution. In practice, partial pivoting is added to produce PA = LU for numerical stability. It is the backbone of direct solvers in scientific computing and ML frameworks.

What is QR decomposition and how does it differ from LU?

QR decomposition factors any matrix (including rectangular ones) into an orthogonal matrix Q and an upper triangular matrix R. Unlike LU, Q has orthonormal columns (Q^T Q = I), which means it preserves vector lengths and makes the computation numerically stable. LU requires a square matrix and can suffer from instability without pivoting, while QR handles overdetermined systems and is preferred when numerical accuracy matters.

When should you use LU decomposition vs QR decomposition?

Use LU when you have a square system and need to solve Ax = b repeatedly with the same A but different right-hand sides — LU is roughly twice as fast as QR per solve. Use QR when dealing with overdetermined (least squares) problems, rectangular matrices, or when numerical stability is paramount. For symmetric positive definite matrices such as covariance matrices, Cholesky decomposition (a special case of LU) is fastest.

How is QR decomposition used to solve least squares problems?

For a least squares problem min ||Ax - b||², compute A = QR, then multiply both sides by Q^T to get Rx̂ = Q^T b. Because R is upper triangular, this reduced system is solved cheaply by back substitution. QR avoids forming the normal equations A^T A x = A^T b, which squares the condition number and loses numerical precision, especially for polynomial fitting or near-collinear features.

What is Cholesky decomposition and when is it preferred?

Cholesky decomposition factors a symmetric positive definite matrix A into LL^T where L is lower triangular. It is preferred over general LU for covariance matrices, kernel matrices in Gaussian Processes, and any SPD system because it is roughly twice as fast and exploits the symmetry structure. It is the standard solver in Gaussian Process regression and Kalman filters.

LU & QR Decomposition: Matrix Factorization

Why Decompose Matrices?

Solving $Ax = b$ directly by computing $A^{-1}$ is expensive ( $O(n^3)$ ) and numerically unstable. Matrix decompositions factor $A$ into simpler matrices that make solving systems faster and more stable.

LU Decomposition

Factor into Lower × Upper triangular. Good for repeated solves with same A, different b. Analogy: Gaussian Elimination.

QR Decomposition

Factor into Orthogonal × Upper triangular. Numerically stable, ideal for least squares. Analogy: Gram-Schmidt.

LU Decomposition

Factor a square matrix into a product of Lower and Upper triangular matrices. It is essentially recording the steps of Gaussian Elimination.

A = LU

L

= lower triangular (1s on diagonal),

U

= upper triangular (pivots on diagonal)

Solving with LU

Factor once: $A = LU$ (O(n³))
For each b, solve $Ly = b$ (forward substitution, O(n²))
Then solve $Ux = y$ (back substitution, O(n²))

Partial Pivoting (PA = LU)

In practice, we permute rows to avoid division by small numbers. This gives $PA = LU$ where $P$ is a permutation matrix.

Interactive Simulator

Step through the LU and QR decomposition process. Watch how matrices are factored at each step.

Matrix Decomposition

Factorizing A into simpler components.

Step 1

Setup

Start with A. Our goal is to transform A into Upper Triangular form (U) using row operations, while recording the multipliers in L.

L (Lower)

1

0

1

×

U (Upper)

4

3

6

3

QR Decomposition

Factor any matrix (even rectangular) into an Orthogonal matrix times an Upper triangular matrix.

A = QR

Q^TQ = I

(orthonormal columns),

R

= upper triangular

Why QR is Stable

Orthogonal matrices preserve lengths: $||Qx|| = ||x||$ . This means errors do not amplify during computation (condition number of $Q$ is 1).

Gram-Schmidt Process

The algorithm to construct $Q$ is called Gram-Schmidt. It iteratively subtracts projections to force orthogonality.

1. Define Vectors

Start with two linearly independent vectors.

Initial Vectors

v₁ Angle15°

v₂ Angle60°

Current Operation

Configure vectors...

$u_1 = a_1$ , then $q_1 = u_1 / ||u_1||$
$u_2 = a_2 - (a_2 \cdot q_1)q_1$ , then $q_2 = u_2 / ||u_2||$
$u_k = a_k - \sum_{j=1}^{k-1}(a_k \cdot q_j)q_j$ , then $q_k = u_k / ||u_k||$

Modified Gram-Schmidt is more numerically stable. Modern libraries use Householder reflections (a sequence of mirrors) instead of projections to compute QR.

Case Study: Bulb Characteristic Fitting

The Problem

You have voltage vs brightness data for a bulb. Fit a polynomial: brightness = a₀ + a₁V + a₂V². This is an overdetermined system (more data points than unknowns).

Using QR for Least Squares

Build Vandermonde matrix: $A_{ij} = V_i^j$
Compute $A = QR$
Solve $R\hat{a} = Q^Tb$ by back substitution (since R is triangular, this is easy)

Why Not Normal Equations?

For high degree polynomials, $A^TA$ becomes ill-conditioned. QR avoids forming this product, preserving numerical stability.

LU vs QR vs SVD

	LU	QR	SVD
Matrix Shape	Square only	Any shape	Any shape
Speed	Fastest $(\sim n^3/3)$	Medium $(\sim 2n^3/3)$	Slowest $(\sim 10n^3)$
Stability	Needs pivoting	Good	Best (Total Least Squares)
Best For	Linear solves (Ax=b)	Least squares	Rank, compression

ML Applications

Cholesky (LLᵀ)

For symmetric positive definite matrices (like covariance matrices), Cholesky is 2x faster than LU. Used in Gaussian Processes for sampling.

Eigenvalue Algorithms

The QR Algorithm for eigenvalues repeatedly computes QR decompositions. This is the standard way numpy.linalg.eig works for non-symmetric matrices.

Backpropagation Efficiency

Computing gradients through matrix inverses involves solving linear systems, which typically uses pre-computed LU factors. Frameworks like JAX exploit this for efficiency.

Randomized Linear Algebra

Large scale ML uses randomized QR (sampling columns) to get approximate decompositions. This is much faster than full SVD for massive recommendation system matrices.

Contents

LU / QR Decomposition

Why Decompose Matrices?

LU Decomposition

QR Decomposition

LU Decomposition

Solving with LU

Partial Pivoting (PA = LU)

Interactive Simulator

Matrix Decomposition

Setup

QR Decomposition

Why QR is Stable

Gram-Schmidt Process

1. Define Vectors

Initial Vectors

Current Operation

Case Study: Bulb Characteristic Fitting

The Problem

Using QR for Least Squares

Why Not Normal Equations?

LU vs QR vs SVD

ML Applications

Cholesky (LLᵀ)

Eigenvalue Algorithms

Backpropagation Efficiency

Randomized Linear Algebra