Modules
08/11
Linear Algebra

Contents

Positive Definite Matrices

The matrices that guarantee convexity and unique minima.

What is Positive Definite?

A symmetric matrix AA is positive definite if for every nonzero vector xx, the quadratic form xTAx>0x^T A x > 0. Intuitively, this means the matrix always "curves upward" and has a unique minimum at the origin.

Positive definite matrices are the "well behaved" matrices in optimization. They guarantee that gradient descent will find a unique global minimum. They appear everywhere: covariance matrices, Hessians of convex functions, and kernel matrices.

The Core Property

xTAx>0x^T A x > 0 for all x0x \neq 0. This single condition has profound consequences for optimization, statistics, and numerical stability.

The Quadratic Form

The expression xTAxx^T A x is called the quadratic form associated with AA. For a 2×2 symmetric matrix, it expands to:

xT[abbc]x=ax12+2bx1x2+cx22x^T \begin{bmatrix} a & b \\ b & c \end{bmatrix} x = ax_1^2 + 2bx_1x_2 + cx_2^2
This is a paraboloid in 3D. Its shape depends on the matrix entries.

Positive Definite

Bowl shape. Unique minimum at origin. All level curves are ellipses.

Indefinite

Saddle shape. Neither max nor min at origin. Level curves are hyperbolas.

Negative Definite

Inverted bowl. Unique maximum at origin. All level curves are ellipses.

Interactive Visualization

Adjust the symmetric matrix entries and watch the level curve of the quadratic form change. The arrows show gradient directions. Notice how positive definite matrices create elliptical level curves that all point inward.

Auto-rotating view
Shape Control
λ₁ (Curvature 1)1.0
λ₂ (Curvature 2)0.5
Rotation (Principal Axes)0°

Positive Definite

Bowl shape (Convex). Unique global minimum.

Tests for Definiteness

There are several equivalent ways to check if a symmetric matrix is positive definite:

  1. Eigenvalue Test: All eigenvalues are strictly positive.
  2. Sylvester's Criterion: All leading principal minors (upper-left determinants) are positive.
  3. Cholesky Test: The Cholesky decomposition A=LLTA = LL^T exists with positive diagonal entries.
  4. Pivot Test: In LU decomposition, all pivots are positive.

For 2×2

A=[abbc]A = \begin{bmatrix} a & b \\ b & c \end{bmatrix} is PD if a>0a > 0 and acb2>0ac - b^2 > 0.

Computational

In practice, attempt Cholesky factorization. If it succeeds, the matrix is PD.

Cholesky Decomposition

A positive definite matrix AA can be uniquely factored as A=LLTA = LL^T, where LL is a lower triangular matrix with positive diagonal entries. This is the "square root" of a matrix.

A=LLTA = LL^T
LL is lower triangular, LTL^T is the transpose (upper triangular)

Why Cholesky?

  • Speed: 2× faster than LU because we exploit symmetry.
  • Stability: No pivoting needed (guaranteed stable for PD).
  • Sampling: To sample from N(0,Σ)\mathcal{N}(0, \Sigma), compute LL from Σ=LLT\Sigma = LL^T, then x=Lzx = Lz where zN(0,I)z \sim \mathcal{N}(0, I).

The Definiteness Spectrum

The classification of a symmetric matrix depends on the signs of its eigenvalues:

TypeEigenvaluesQuadratic FormOptimization
Positive DefiniteAll λ > 0xTAx>0x^TAx > 0Unique minimum
Positive SemidefiniteAll λ ≥ 0xTAx0x^TAx \geq 0Minimum subspace
IndefiniteMixed signsBoth + and -Saddle point
Negative SemidefiniteAll λ ≤ 0xTAx0x^TAx \leq 0Maximum subspace
Negative DefiniteAll λ < 0xTAx<0x^TAx < 0Unique maximum

ML Applications

Covariance Matrices

Sample covariance matrices are always positive semidefinite. In Gaussians, we need PD covariance for the density to be well defined.

Convex Optimization

A function is convex if its Hessian is PSD everywhere. This guarantees gradient descent converges to the global minimum.

Kernel Matrices

A valid kernel function must produce PSD Gram matrices. This is Mercer's condition, ensuring the implicit feature space is well defined.

Gaussian Processes

GP priors require PD covariance matrices for sampling. Cholesky is used to generate correlated samples from a GP posterior.