ML Interview Q Series: How can one derive the Singular Value Decomposition for a matrix M?
📚 Browse the full ML Interview series here.
Comprehensive Explanation
Singular Value Decomposition (SVD) of a matrix is a fundamental factorization in linear algebra that decomposes a matrix M into three components. If M is of dimension m x n, we seek to represent it as an orthonormal basis for its rows (U), an orthonormal basis for its columns (V), and a diagonal matrix (Σ) containing the singular values.
Here:
M is the original matrix, of size m x n.
U is an m x m orthonormal matrix. Each column of U is a left singular vector of M.
Σ is an m x n diagonal matrix (with non-negative real numbers on the diagonal). These diagonal entries are the singular values of M, arranged in decreasing order.
V is an n x n orthonormal matrix. Each column of V is a right singular vector of M, and V^T is its transpose.
To calculate SVD in a theoretical sense:
You compute the eigen-decomposition of M^T M (an n x n matrix) or M M^T (an m x m matrix). The non-zero eigenvalues of M^T M (or M M^T) correspond to the squares of the singular values in Σ. The eigenvectors of M^T M give you the columns of V, and the eigenvectors of M M^T give you the columns of U. Once you have U and V, the singular values are derived by taking the square roots of the eigenvalues of M^T M or M M^T.
In practice, most numerical libraries do not directly perform the eigen-decomposition of M^T M but rather use more stable iterative algorithms (like the Golub–Reinsch algorithm) that are efficient and numerically robust.
Practical Implementation
A common way to compute the SVD programmatically is to rely on built-in libraries that implement stable numerical routines.
import numpy as np
# Example matrix
M = np.array([
[1, 2, 3],
[4, 5, 6]
], dtype=float)
# Compute SVD using NumPy
U, S, Vt = np.linalg.svd(M, full_matrices=True)
# U is m x m, S is a 1D array of singular values, Vt is n x n
# We often construct Sigma if needed
Sigma = np.zeros((M.shape[0], M.shape[1]))
np.fill_diagonal(Sigma, S)
print("U:", U)
print("Sigma:", Sigma)
print("V^T:", Vt)
In this code snippet, np.linalg.svd
returns:
U: an m x m orthonormal matrix.
S: a 1D array of the singular values (length is min(m, n)).
Vt: the transpose of the n x n orthonormal matrix V.
Follow-up question on interpreting U, Σ, and V
U’s columns are left singular vectors. They are orthonormal eigenvectors of M M^T. V’s columns are right singular vectors. They are orthonormal eigenvectors of M^T M. The matrix Σ (often constructed from the 1D array S) places the singular values on the diagonal.
Follow-up question on numerical stability and real-world applications
SVD is numerically stable and widely used for dimensionality reduction, data compression, noise reduction, and principal component analysis (PCA). For large matrices, specialized methods such as randomized SVD can be used to reduce computation time and memory usage, particularly when only the top singular values and singular vectors are needed.
One pitfall is that for extremely large matrices, computing the full SVD can be very expensive (in both time and memory). Truncated SVD or incremental SVD techniques are often more efficient in practical applications where only a few singular values and vectors are necessary.
Follow-up question on rank approximations
A significant advantage of SVD is that it provides a best low-rank approximation under the Frobenius norm. If one retains only the top k singular values (with their corresponding singular vectors) and zeros out the rest, the resulting reconstructed matrix is an approximation of rank k. This is the basis of tasks like matrix completion and latent semantic analysis in natural language processing.
Follow-up question on the difference between SVD and eigen-decomposition
An m x n matrix M does not necessarily have eigenvalues or eigenvectors in the sense of square matrices. But one can form M^T M (n x n) or M M^T (m x m), both of which are square and symmetric. The SVD is a more general concept that applies to any rectangular real (or complex) matrix. Eigen-decomposition typically deals with diagonalizing a square matrix. SVD effectively uses the eigen-decomposition of M^T M or M M^T as a bridge to factorize M itself.
Follow-up question on the physical meaning of singular values
Each singular value gauges the amount of variation or “energy” captured along a particular singular vector direction. The largest singular value corresponds to the direction of greatest variance in the row space/column space, while smaller singular values represent directions of decreasing importance or variance.
Follow-up question on handling matrices that are not full rank
Even when the matrix M is rank-deficient, the SVD can still be computed. Some singular values will be zero. Those zero singular values imply that the corresponding rows or columns can be expressed as linear combinations of others. This property is extremely useful for tasks like diagnosing multicollinearity in statistical problems or compressing data by removing redundant directions.
Follow-up question on implementation details in PyTorch
PyTorch provides an SVD routine for GPU-accelerated computations. In practice, you might use this for deep learning tasks that require dimension reduction or for analyzing layer weights in neural networks.
import torch
M_torch = torch.tensor([[1., 2., 3.],
[4., 5., 6.]])
U_torch, S_torch, V_torch = torch.svd(M_torch)
# Reconstruct
Sigma_torch = torch.zeros_like(M_torch)
for i in range(len(S_torch)):
Sigma_torch[i, i] = S_torch[i]
reconstructed = U_torch.mm(Sigma_torch.mm(V_torch.t()))
This shows how to perform a basic SVD in PyTorch. The reconstruction of M is done by multiplying U, Σ, and V^T. In newer versions of PyTorch, torch.linalg.svd
is recommended for its improved numerical properties.
Follow-up question on edge cases
SVD might fail to converge if:
The matrix elements are extremely large or extremely small, leading to numerical instability.
The matrix is ill-conditioned (though SVD is typically one of the more stable decompositions).
Most modern linear algebra libraries handle these issues using robust algorithms and well-chosen stopping criteria.