ML Interview Q Series: When would you apply the Moore-Penrose pseudoinverse, and in what way can it be computed?

Apr 08, 2025

📚 Browse the full ML Interview series here.

Comprehensive Explanation

The Moore-Penrose pseudoinverse is a generalization of the matrix inverse designed for handling situations where a matrix might not be square or might be singular. It is especially helpful in providing a least squares solution to a system of linear equations, or in dealing with ill-conditioned systems where the usual inverse does not exist or is numerically unstable.

Connect with me on X (Twitter)

It can be computed through the Singular Value Decomposition (SVD). Suppose a matrix M has dimensions m x n. We factorize M into three components: U, Sigma, and V^T, where:

U is an m x m orthogonal matrix.
Sigma is an m x n diagonal-like matrix of singular values (with nonnegative real numbers along the main diagonal and zeros elsewhere).
V^T is an n x n orthogonal matrix (the transpose of V).

Right after obtaining this decomposition, the Moore-Penrose pseudoinverse M^+ is found by taking the reciprocal of the non-zero singular values and transposing the order of multiplication.

Within this formula:

V is the orthogonal matrix whose columns are the right singular vectors of M.
Sigma^+ is created by taking the transpose of Sigma, then inverting all non-zero entries (singular values) on the main diagonal. If Sigma is m x n, Sigma^+ becomes n x m.
U^T is the transpose of U, where U’s columns are the left singular vectors of M.

The essential property is that Sigma^+ reverses non-zero singular values 1/sigma_i, and zeros remain zeros. This technique is robust in handling rank-deficient or non-square matrices because it gracefully addresses small or zero singular values by inverting only the non-zero ones.

Example of Calculation in Python

import numpy as np

# Create a random m x n matrix
M = np.random.randn(5, 3)

# Compute the Moore-Penrose pseudoinverse
M_pinv = np.linalg.pinv(M)

# Verify shapes
print("Original shape:", M.shape)
print("Pseudoinverse shape:", M_pinv.shape)

# Example usage: solving a system Mx = b in least-squares sense
b = np.random.randn(5)
x_least_squares = M_pinv @ b

This code snippet shows how to compute the pseudoinverse using NumPy’s built-in function pinv, which internally relies on SVD. The result is a matrix M_pinv with shape 3 x 5 for the original 5 x 3 matrix M.

Why It Is Useful

It is useful in resolving linear systems that are not necessarily full-rank or over-/under-determined. For instance, in linear regression or in dimensionality reduction problems, the pseudoinverse helps find solutions that minimize the squared error. Additionally, when you need a stable pseudo-solution in the presence of numerical issues, the Moore-Penrose pseudoinverse helps because it effectively discards directions corresponding to very small singular values.

How It Differs from a Standard Inverse

Unlike a standard inverse (which only applies to square and non-singular matrices), the Moore-Penrose pseudoinverse can be used even if the matrix is not square or if it is singular. Also, when multiple solutions to Mx = b exist (because M is not full rank), the Moore-Penrose pseudoinverse yields the solution x with the smallest Euclidean norm, an important property in many optimization problems.

Potential Follow-up Questions

How does the pseudoinverse help in solving least squares problems?

It provides a direct way to obtain the parameter vector x that minimizes the squared error norm(Mx - b). This x is given by M^+ b. Since M^+ is derived from the SVD of M, it is numerically stable and effectively manages directions in which M has zero or negligible singular values. This approach is central in linear regression, especially in the normal equation method.

Are there numerical stability or performance concerns when computing the pseudoinverse?

Computing the SVD is typically more computationally expensive than approaches such as LU or QR factorization, especially for very large matrices. For extremely big matrices, iterative solvers or approximate methods might be more practical. Additionally, if M has extremely small singular values, direct inversion might magnify numerical errors. Regularization methods, such as Tikhonov regularization, can mitigate this by slightly modifying the singular values before inversion.

What happens if there are singular values close to zero?

When singular values are very small, the reciprocal becomes very large, which can lead to unstable solutions. Numerical implementations, such as numpy.linalg.pinv, handle this by applying a cutoff threshold below which singular values are treated as zero. This process discards components that would otherwise blow up the solution, leading to a more robust (but approximate) pseudoinverse.

Rohan's Bytes

Discussion about this post