What is Positive Definite?
A symmetric matrix is positive definite if for every nonzero vector , the quadratic form . Intuitively, this means the matrix always "curves upward" and has a unique minimum at the origin.
Positive definite matrices are the "well behaved" matrices in optimization. They guarantee that gradient descent will find a unique global minimum. They appear everywhere: covariance matrices, Hessians of convex functions, and kernel matrices.
The Core Property
for all . This single condition has profound consequences for optimization, statistics, and numerical stability.
The Quadratic Form
The expression is called the quadratic form associated with . For a 2×2 symmetric matrix, it expands to:
Positive Definite
Bowl shape. Unique minimum at origin. All level curves are ellipses.
Indefinite
Saddle shape. Neither max nor min at origin. Level curves are hyperbolas.
Negative Definite
Inverted bowl. Unique maximum at origin. All level curves are ellipses.
Interactive Visualization
Adjust the symmetric matrix entries and watch the level curve of the quadratic form change. The arrows show gradient directions. Notice how positive definite matrices create elliptical level curves that all point inward.
Energy Surface
Plotting z = xᵀAx in 3D
Positive Definite
Bowl shape (Convex). Unique global minimum.
Tests for Definiteness
There are several equivalent ways to check if a symmetric matrix is positive definite:
- Eigenvalue Test: All eigenvalues are strictly positive.
- Sylvester's Criterion: All leading principal minors (upper-left determinants) are positive.
- Cholesky Test: The Cholesky decomposition exists with positive diagonal entries.
- Pivot Test: In LU decomposition, all pivots are positive.
For 2×2
is PD if and .
Computational
In practice, attempt Cholesky factorization. If it succeeds, the matrix is PD.
Cholesky Decomposition
A positive definite matrix can be uniquely factored as , where is a lower triangular matrix with positive diagonal entries. This is the "square root" of a matrix.
Why Cholesky?
- Speed: 2× faster than LU because we exploit symmetry.
- Stability: No pivoting needed (guaranteed stable for PD).
- Sampling: To sample from , compute from , then where .
The Definiteness Spectrum
The classification of a symmetric matrix depends on the signs of its eigenvalues:
| Type | Eigenvalues | Quadratic Form | Optimization |
|---|---|---|---|
| Positive Definite | All λ > 0 | Unique minimum | |
| Positive Semidefinite | All λ ≥ 0 | Minimum subspace | |
| Indefinite | Mixed signs | Both + and - | Saddle point |
| Negative Semidefinite | All λ ≤ 0 | Maximum subspace | |
| Negative Definite | All λ < 0 | Unique maximum |
ML Applications
Covariance Matrices
Sample covariance matrices are always positive semidefinite. In Gaussians, we need PD covariance for the density to be well defined.
Convex Optimization
A function is convex if its Hessian is PSD everywhere. This guarantees gradient descent converges to the global minimum.
Kernel Matrices
A valid kernel function must produce PSD Gram matrices. This is Mercer's condition, ensuring the implicit feature space is well defined.
Gaussian Processes
GP priors require PD covariance matrices for sampling. Cholesky is used to generate correlated samples from a GP posterior.