Introduction
"Orthogonal" comes from Greek: orthos (right, correct) + gonia (angle). In everyday language, it means "perpendicular" or "at a right angle." In mathematics and machine learning, orthogonality represents a deeper concept: complete independence.
Two orthogonal vectors share absolutely no common direction. In the context of data, if two features are orthogonal, knowing one tells you nothing about the other. They provide completely unique, non-redundant information. This is why orthogonality is sometimes called the "Holy Grail" of feature engineering.
Why Orthogonality Matters in ML
- Feature Independence: Orthogonal features maximize information content per feature.
- Numerical Stability: Orthogonal matrices are perfectly conditioned (condition number = 1).
- Gradient Flow: Orthogonal weight matrices preserve gradient magnitudes in deep networks.
- Efficient Computation: Inverting orthogonal matrices is trivial (just transpose).
Orthogonality is the mathematical engine behind algorithms like PCA, SVD, QR decomposition, and orthogonal weight initialization in neural networks.
Orthogonal Vectors
Two vectors and in are orthogonal (written ) if their inner product (dot product) is zero:
This definition extends to any finite dimension n, not just 2D or 3D.
Example: Verifying Orthogonality
Let and . Are they orthogonal?
u · v = (1)(1) + (2)(1) + (3)(-1)
u · v = 1 + 2 - 3 = 0
Yes, u and v are orthogonal!
The Zero Vector
The zero vector is orthogonal to every vector (including itself), since for any v. However, when we talk about orthogonal sets or bases, we typically exclude the zero vector.
Orthogonal Sets
A set of vectors is an orthogonal set if every pair is orthogonal:
Key Theorem: An orthogonal set of non-zero vectors is always linearly independent. This makes orthogonal vectors extremely useful as basis vectors.
Geometric Intuition
The dot product has a beautiful geometric interpretation that explains why orthogonal vectors have dot product zero:
When , , so the dot product is zero.
Projection Interpretation
The dot product measures "how much of u lies in the direction of v."
If dot = 0, projection = 0. The vectors share no common direction.
Pythagorean Theorem
For orthogonal vectors, the Pythagorean theorem holds:
This is because the cross-term vanishes.
Interactive: Projection & Orthogonality
Adjust the angle of vector u to see its projection onto v. When the projection vanishes, the vectors are orthogonal.
Vector Projection
Decomposing u into components parallel and perpendicular to v.
Decomposition
When the angle is 90° (or 270°), the dot product is zero, and the projection vanishes.
Orthonormal Bases
An orthonormal set is an orthogonal set where every vector also has unit length (norm = 1). This is the "gold standard" for coordinate systems.
1. Orthogonal
(if i ≠ j)
2. Normalized
(unit length)
Combined using Kronecker delta:
Why Orthonormal Bases are Powerful
In an orthonormal basis, finding coordinates is trivial. You don't need to solve a system of equations; you just take dot products!
The coordinate for is just . This is why Fourier series and Wavelet transforms (which use orthonormal bases) are computationally feasible.
Orthogonal Matrices
An orthogonal matrix is a square matrix Q whose columns form an orthonormal set. (Confusingly named; should be "orthonormal matrix").
Defining Property
Which implies: Q⁻¹ = Q^T
Examples
Rotation Matrix
Reflection Matrix
Permutation Matrix
Identity Matrix
Properties & Proofs
1. Isometry (Length Preserving)
Multiplying by Q does not change length.
2. Angle Preserving
Dot products are preserved.
3. Determinant
Determinant is always ±1.
4. Eigenvalues
All eigenvalues lie on the complex unit circle.
Computational Advantage
Inverting a general matrix is O(n³). Inverting an orthogonal matrix is O(n²) (just transpose!). Also, condition number = 1 means perfect numerical stability.
Gram-Schmidt Process
The Gram-Schmidt process transforms any linearly independent vectors into an orthonormal basis for the same space. It works by iteratively subtracting the projection onto previous vectors.
The Algorithm
Step 1: Normalize first vector
Step 2: Subtract projection on e1
Step k: Subtract all previous projections
Interactive: Gram-Schmidt
Watch step-by-step how orthogonalization happens.
1. Define Vectors
Start with two linearly independent vectors.
Initial Vectors
Current Operation
QR Decomposition
Matrix form of Gram-Schmidt: A = QR.
- Q: Orthogonal matrix (the e vectors).
- R: Upper triangular matrix (the dot products).
Used for solving least squares () and finding eigenvalues.
ML Applications
Orthogonal Weight Initialization
Initializing RNN/LSTM weights as orthogonal matrices prevents vanishing/exploding gradients because , preserving signal magnitude over time.
PCA & Decorrelation
PCA finds orthogonal directions of maximum variance. This "whitens" or decorrelates data, making downstream learning easier for models.
Orthogonal Regularization
Adding a loss term encourages weights to remain orthogonal during training, improving stability in GANs.