What is a positive definite matrix and how do you check for it?

A symmetric matrix A is positive definite if the quadratic form x^T A x is strictly greater than zero for every nonzero vector x. There are four equivalent tests: (1) all eigenvalues are strictly positive, (2) Sylvester's criterion — all leading principal minors are positive, (3) the Cholesky decomposition A = LL^T exists with all positive diagonal entries in L, and (4) all pivot values in LU decomposition are positive. In practice, attempting a Cholesky factorization is the most computationally efficient check.

What is the difference between positive definite and positive semidefinite?

A positive definite (PD) matrix satisfies x^T A x > 0 for all nonzero x, meaning all eigenvalues are strictly positive. A positive semidefinite (PSD) matrix satisfies x^T A x >= 0, allowing some eigenvalues to be zero. PD matrices have a unique minimum at the origin; PSD matrices have a minimum subspace (a flat region rather than a single point). In machine learning, covariance matrices are PSD (not always PD) because the data may lie in a lower-dimensional subspace.

Why must covariance matrices be positive semidefinite?

A covariance matrix Sigma is defined as E[(X - mu)(X - mu)^T], which can be written as a sum of outer products and is therefore always PSD by construction. Mathematically, for any vector v, v^T Sigma v = E[(v^T (X - mu))^2] >= 0 since it is an expectation of a squared quantity. The matrix must be PD (not just PSD) for the multivariate Gaussian density to be well-defined and non-degenerate; otherwise the distribution collapses onto a lower-dimensional subspace.

How does the Hessian being positive definite relate to finding minima?

The Hessian matrix H of a twice-differentiable function encodes its local curvature. At a critical point (where the gradient is zero), a positive definite Hessian confirms that every direction curves upward, guaranteeing a strict local minimum. When the Hessian is PD everywhere in its domain, the function is strictly convex, and gradient descent is guaranteed to converge to the unique global minimum. An indefinite Hessian at a critical point indicates a saddle point, not a minimum.

What is the role of positive definite matrices in kernel methods?

A kernel function k(x, y) is valid if and only if the Gram matrix K_ij = k(x_i, x_j) is positive semidefinite for any finite set of input points — this is Mercer's condition. The PSD requirement ensures that the kernel corresponds to an inner product in some (possibly infinite-dimensional) feature space, giving the mathematical foundation for SVMs and Gaussian processes. Common PD kernels include the RBF (radial basis function) kernel and polynomial kernels; invalid kernels that violate PSD can lead to non-convex optimization problems.

Positive Definite Matrices: Convexity Guarantees

What is Positive Definite?

A symmetric matrix $A$ is positive definite if for every nonzero vector $x$ , the quadratic form $x^T A x > 0$ . Intuitively, this means the matrix always "curves upward" and has a unique minimum at the origin.

Positive definite matrices are the "well behaved" matrices in optimization. They guarantee that gradient descent will find a unique global minimum. They appear everywhere: covariance matrices, Hessians of convex functions, and kernel matrices.

The Core Property

$x^T A x > 0$ for all $x \neq 0$ . This single condition has profound consequences for optimization, statistics, and numerical stability.

The Quadratic Form

The expression $x^T A x$ is called the quadratic form associated with $A$ . For a 2×2 symmetric matrix, it expands to:

x^T \begin{bmatrix} a & b \\ b & c \end{bmatrix} x = ax_1^2 + 2bx_1x_2 + cx_2^2

This is a paraboloid in 3D. Its shape depends on the matrix entries.

Positive Definite

Bowl shape. Unique minimum at origin. All level curves are ellipses.

Indefinite

Saddle shape. Neither max nor min at origin. Level curves are hyperbolas.

Negative Definite

Inverted bowl. Unique maximum at origin. All level curves are ellipses.

Interactive Visualization

Adjust the symmetric matrix entries and watch the level curve of the quadratic form change. The arrows show gradient directions. Notice how positive definite matrices create elliptical level curves that all point inward.

Energy Surface

Plotting z = xᵀAx in 3D

Auto-rotating view

Shape Control

λ₁ (Curvature 1)1.0

λ₂ (Curvature 2)0.5

Rotation (Principal Axes)0°

Positive Definite

Bowl shape (Convex). Unique global minimum.

Tests for Definiteness

There are several equivalent ways to check if a symmetric matrix is positive definite:

Eigenvalue Test: All eigenvalues are strictly positive.
Sylvester's Criterion: All leading principal minors (upper-left determinants) are positive.
Cholesky Test: The Cholesky decomposition $A = LL^T$ exists with positive diagonal entries.
Pivot Test: In LU decomposition, all pivots are positive.

For 2×2

$A = \begin{bmatrix} a & b \\ b & c \end{bmatrix}$ is PD if $a > 0$ and $ac - b^2 > 0$ .

Computational

In practice, attempt Cholesky factorization. If it succeeds, the matrix is PD.

Cholesky Decomposition

A positive definite matrix $A$ can be uniquely factored as $A = LL^T$ , where $L$ is a lower triangular matrix with positive diagonal entries. This is the "square root" of a matrix.

A = LL^T

L

is lower triangular,

L^T

is the transpose (upper triangular)

Why Cholesky?

Speed: 2× faster than LU because we exploit symmetry.
Stability: No pivoting needed (guaranteed stable for PD).
Sampling: To sample from $\mathcal{N}(0, \Sigma)$ , compute $L$ from $\Sigma = LL^T$ , then $x = Lz$ where $z \sim \mathcal{N}(0, I)$ .

The Definiteness Spectrum

The classification of a symmetric matrix depends on the signs of its eigenvalues:

Type	Eigenvalues	Quadratic Form	Optimization
Positive Definite	All λ > 0	$x^TAx > 0$	Unique minimum
Positive Semidefinite	All λ ≥ 0	$x^TAx \geq 0$	Minimum subspace
Indefinite	Mixed signs	Both + and -	Saddle point
Negative Semidefinite	All λ ≤ 0	$x^TAx \leq 0$	Maximum subspace
Negative Definite	All λ < 0	$x^TAx < 0$	Unique maximum

ML Applications

Covariance Matrices

Sample covariance matrices are always positive semidefinite. In Gaussians, we need PD covariance for the density to be well defined.

Convex Optimization

A function is convex if its Hessian is PSD everywhere. This guarantees gradient descent converges to the global minimum.

Kernel Matrices

A valid kernel function must produce PSD Gram matrices. This is Mercer's condition, ensuring the implicit feature space is well defined.

Gaussian Processes

GP priors require PD covariance matrices for sampling. Cholesky is used to generate correlated samples from a GP posterior.

Contents

What is Positive Definite?

The Core Property

The Quadratic Form

Positive Definite

Indefinite

Negative Definite

Interactive Visualization

Energy Surface

Positive Definite

Tests for Definiteness

For 2×2

Computational

Cholesky Decomposition

Why Cholesky?

The Definiteness Spectrum

ML Applications

Covariance Matrices

Convex Optimization

Kernel Matrices

Gaussian Processes