ML Interview Q Series: Can you explain the idea behind singular eigenvalues, along with left singulars and right singulars, in linear algebra?
📚 Browse the full ML Interview series here.
Comprehensive Explanation
Singular values, left singular vectors, and right singular vectors appear when we perform a Singular Value Decomposition (SVD) of a matrix. For a matrix of shape m x n (with m possibly not equal to n), the SVD provides a factorization that reveals useful structural properties of the matrix, especially related to its rank, range, and null space.
Singular Value Decomposition
The central formula for SVD is shown below.
Where:
M is an m x n matrix.
U is an m x m orthogonal (or unitary) matrix whose columns are called left singular vectors.
Sigma (written as a capital Greek letter Sigma) is an m x n diagonal-like matrix that holds singular values along its diagonal (the rest of the entries are zero).
V is an n x n orthogonal (or unitary) matrix whose columns are called right singular vectors.
V^T is the transpose (or conjugate transpose in the complex case) of V.
In many references, the singular values themselves are often denoted by sigma_1 >= sigma_2 >= ... >= sigma_r >= 0, where r is the rank of M.
Singular Eigenvalues vs. Eigenvalues
In some texts, singular values are called “singular eigenvalues” because each singular value is actually the square root of an eigenvalue of M^T M (which is an n x n matrix) or M M^T (which is an m x m matrix). Specifically, if you take M^T M and find its eigenvalues, those nonnegative eigenvalues, once you take their square roots, yield the singular values of M.
An important distinction is that the eigenvalues of a general matrix M (when we talk about eigenvalues in the usual sense) might be complex or negative, whereas singular values are always nonnegative real numbers. Singular values capture the magnitude of how M stretches a vector in various orthogonal directions.
Left Singular Vectors
The columns of U (the m x m matrix) are called left singular vectors. These vectors are eigenvectors of M M^T, which is an m x m matrix. Specifically, if you take M M^T, you can find its eigenvectors, and those eigenvectors become the columns of U in the SVD decomposition. They form an orthonormal basis for R^m in the real case.
Right Singular Vectors
Similarly, the columns of V (the n x n matrix) are called right singular vectors. These vectors are eigenvectors of M^T M, which is n x n. Like U, V is orthonormal in the real case. They form an orthonormal basis for R^n.
Interpretation and Geometric Perspective
When M acts on a vector in R^n, the most stretched direction in the output space R^m corresponds to the largest singular value sigma_1, and the direction itself is the right singular vector v_1 in the domain and the left singular vector u_1 in the codomain. The ratio of the lengths in the output space to the input space for that particular direction is exactly sigma_1. Subsequent singular vectors define other mutually orthogonal directions of maximal stretching.
Example Code Snippet in Python
Below is a minimal example using NumPy to perform SVD and reveal the singular values, left singular vectors, and right singular vectors.
import numpy as np
# Example matrix M
M = np.array([
[1, 2, 3],
[4, 5, 6]
], dtype=float)
# Perform SVD
U, S, Vt = np.linalg.svd(M, full_matrices=True)
print("Left singular vectors (U):")
print(U)
print("\nSingular values:")
print(S)
print("\nRight singular vectors (V^T):")
print(Vt)
In the above code:
U is the array whose columns are the left singular vectors.
S is the array of singular values (in decreasing order).
Vt is V transposed, meaning each row of Vt is a right singular vector of the original matrix M.
Potential Follow-Up Questions
Could you explain why the left singular vectors are eigenvectors of M M^T while the right singular vectors are eigenvectors of M^T M?
When we write M = U Sigma V^T, then M M^T = (U Sigma V^T)(U Sigma V^T)^T = U Sigma V^T V Sigma^T U^T. Because V^T V = I in the orthonormal case and Sigma^T = Sigma for diagonal structures, that reduces to U Sigma^2 U^T. Hence, M M^T shares eigenvectors with U, which are the columns of U itself. Likewise, M^T M reduces to V Sigma^2 V^T, showing that V are its eigenvectors.
How do numerical issues manifest when computing SVD in practice?
In practical numerical linear algebra libraries, iterative algorithms (e.g., Jacobi SVD, QR-based SVD) are used. These can face numerical instability for matrices that are ill-conditioned (meaning the ratio sigma_max / sigma_min is very large). Floating-point errors may cause slight deviations from perfect orthogonality in U and V, and extremely small singular values can be computed as zero if they’re below machine precision.
How can we interpret small versus large singular values?
A large singular value indicates a direction in which the matrix M significantly stretches the input vector. A small (or zero) singular value indicates a direction that is barely stretched (or completely collapsed). In data analysis, discarding small singular values (and corresponding singular vectors) is central to principal component analysis (PCA)-like methods or low-rank approximations, because those directions convey relatively little variance or "energy" in the data.
What happens to the SVD if the matrix is not square or if it is rank-deficient?
The beauty of SVD is that it applies to any rectangular matrix. For instance, if M is m x n with m > n, you can still factor M as U Sigma V^T, but Sigma becomes an m x n matrix with only min(m, n) singular values along the diagonal. If the matrix is rank-deficient, some of those diagonal singular values will be zero, indicating that some directions do not contribute to the transformation.
Why might SVD be preferred over eigen-decomposition for certain tasks?
Eigen-decomposition is typically defined for square matrices that are diagonalizable. In addition, those matrices need to have a complete set of eigenvectors. Meanwhile, SVD always exists for any m x n matrix, square or rectangular, and handles rank-deficient cases gracefully. It also reveals the natural geometric structure (stretching factors) of the matrix even when the matrix is not symmetric.
Are singular values relevant in machine learning?
Yes. Singular values help to diagnose issues like collinearity in data, which leads to inflated variances in parameter estimates. They also underlie the concept of low-rank matrix approximations, used extensively in recommender systems, matrix factorization methods, and dimensionality reduction techniques such as latent semantic analysis (LSA) or PCA. SVD-based approaches provide insights into the data’s strongest directions of variance.
How does SVD compare to PCA?
Principal Component Analysis is essentially an eigen-decomposition of the covariance matrix X^T X for a data matrix X. But computing PCA often involves SVD on the mean-centered data matrix. The principal components are the right singular vectors of the data matrix, and the singular values relate to the variance explained. SVD and PCA are thus tightly linked, with PCA focusing on the covariance structure of the data, while SVD is a more general factorization of any matrix.
How do you handle very large matrices when computing the SVD?
For large-scale problems, one might use randomized or approximate SVD algorithms. These methods sample the matrix or employ iterative subspace estimations to approximate the top k singular vectors and singular values without needing a full SVD of the entire matrix. Libraries like scikit-learn offer randomized SVD functionalities that are efficient when k is small compared to n.
How could rank truncation be useful with SVD?
Truncating the decomposition by keeping only the top k singular values and their corresponding singular vectors gives a low-rank approximation M_k. This can be used in applications like data compression, noise reduction, or latent semantic indexing in natural language processing. By ignoring small singular values, one discards directions that have minimal contribution to the matrix’s action, thus simplifying the model.
How does the choice of floating-point precision impact SVD computations?
If you perform SVD in single precision (float32) vs. double precision (float64), you might observe different stability behaviors and small singular values might be truncated prematurely. The condition number (the ratio of largest to smallest nonzero singular value) determines how sensitive the matrix inversion or solutions to linear systems can be. Double precision typically provides more accuracy but at a cost of higher computational resources.
Why is SVD considered a robust technique for numerical applications?
SVD provides a stable way to analyze matrix rank, range, null space, and near-linear dependencies among columns. Unlike eigen-decomposition for nonsymmetric matrices, SVD does not rely on a matrix being symmetric or diagonalizable in the usual sense. It always exists and gracefully handles cases where the matrix has low rank or exhibits near-degenerate directions. Furthermore, modern numeric libraries implement carefully designed algorithms that ensure good stability and convergence properties for SVD.