ML Interview Q Series: How can we interpret the linear system Ax = b, and under what circumstances does it possess a single unique solution?
📚 Browse the full ML Interview series here.
Comprehensive Explanation
Understanding Ax = b
In this system, A denotes a matrix (usually of dimension m x n), x is a column vector of size n, and b is a column vector of size m. The equation represents m linear equations in n unknowns. Each row of A corresponds to the coefficients of one linear equation, and each component of b corresponds to the right-hand side constant of that equation.
The goal is to find a vector x that satisfies all these linear equations simultaneously. If we interpret A as a transformation that maps the vector space of possible x vectors to some output space, then Ax = b means we want to find an x whose image under the transformation A is exactly b.
Condition for a Unique Solution
When A is a square n x n matrix and has full rank (rank(A) = n), it is invertible. In that case, the system has exactly one solution given by:
Here, A^{-1} is the inverse of A. The determinant of A in this scenario is non-zero, which equivalently means that the rows of A (or columns of A) are linearly independent.
If A is not square (for example, m > n or m < n), uniqueness depends on whether rank(A) equals the number_of_columns(A). Specifically:
If rank(A) = number_of_columns(A) and m >= n, the system typically has a unique solution.
If rank(A) < number_of_columns(A), the system can either have infinitely many solutions (if rank(A) = rank(A|b)) or no solution (if rank(A) < rank(A|b)).
Geometric Intuition
The rows (or columns) of A span a certain subspace of the vector space. For Ax = b to have a unique solution:
The rank of A must be maximal with respect to the dimension of x.
b must lie in the column space of A.
There must be no redundancy among the columns of A, ensuring that no two distinct x vectors produce the same b.
When these conditions are satisfied, the linear mapping from x to b is one-to-one and onto the relevant subspace, giving us exactly one solution.
Common Pitfalls and Edge Cases
Degenerate Matrices
If A is singular (for square A, det(A) = 0), or rank(A) < n, then either no solutions or infinitely many solutions can exist. One must check consistency (i.e., if b is in the column space of A).
Overdetermined and Underdetermined Systems
Overdetermined (m > n): There can be no solution if the rank condition is not met. Otherwise, there might be a unique solution (though typically in practice, a least-squares solution is found if an exact solution does not exist).
Underdetermined (m < n): Typically there could be infinitely many solutions unless additional constraints are imposed.
Numerical Stability
Even if a matrix is theoretically invertible, in practice, floating-point computations may cause inaccuracies. For large matrices, we often use numerical techniques (like QR decomposition or SVD) to solve Ax = b robustly.
Follow-up Questions
Can you discuss the link between matrix rank and the uniqueness of solutions in more depth?
The rank of a matrix A (rank(A)) is the dimension of its column space (or equivalently, row space). If rank(A) = n (where n is the number_of_columns(A)), then all columns of A are linearly independent. This implies:
There is no redundancy in how each column can be expressed as a linear combination of the others.
For any vector b in the column space, there is precisely one way to combine those columns to obtain b.
When A is square (n x n) and rank(A) = n, we know A is invertible, thus guaranteeing a single unique solution.
In contrast, if rank(A) < n, some columns can be written as linear combinations of others, meaning multiple x vectors can produce the same output (leading to infinitely many solutions) or, in certain cases where b lies outside the column space, no solutions exist.
If the matrix A is not square, how do we handle Ax = b for a unique solution?
When A is rectangular, there are two common scenarios:
m > n (more equations than unknowns). This is overdetermined. A necessary condition for a unique solution is rank(A) = n. In that situation, A has linearly independent columns, so if b is in the span of those columns, the solution is unique.
m < n (fewer equations than unknowns). This is underdetermined. Even if rank(A) = m, typically there can be infinitely many solutions because we have more unknowns than equations. A single solution might be singled out by imposing additional constraints (like minimizing the L2 norm of x).
What techniques are commonly used to find the solution in large-scale or non-invertible scenarios?
For large-scale systems or cases where A is non-square:
QR Decomposition: Particularly effective for solving least-squares problems (overdetermined systems).
Singular Value Decomposition (SVD): Provides insight into the rank and can handle ill-conditioned or rank-deficient matrices.
Moore-Penrose Pseudoinverse: Used to find a least-squares solution, often giving a minimum-norm solution when the system is underdetermined.
In Python, libraries like NumPy, SciPy, and frameworks such as PyTorch or TensorFlow provide built-in methods (like numpy.linalg.lstsq
) to find solutions or approximate solutions depending on the system type.
How do we handle the possibility of no solution versus infinitely many solutions?
The Rouché–Capelli theorem tells us that a solution exists if and only if rank(A) = rank(A|b). Here, (A|b) denotes the augmented matrix formed by appending b as an extra column to A. If rank(A) < rank(A|b), no solution exists (inconsistent system). If rank(A) = rank(A|b) but rank(A) < number_of_columns(A), there are infinitely many solutions. If rank(A) = rank(A|b) = number_of_columns(A), there is a unique solution.
How is this relevant in Machine Learning or Deep Learning?
Many learning algorithms fit parameters by solving systems of linear equations or by using techniques from linear algebra:
Linear Regression: Often framed as solving (X^T X)w = X^T y for the weight vector w.
Optimization Algorithms: Iterative methods like gradient descent often involve inverting the Hessian or approximations to solve normal equations efficiently.
Neural Networks: Though typically not about directly solving Ax = b, numerical stability and invertibility concepts are still relevant, for instance, in certain layers or transformations.
These fundamentals underlie how we handle parameter estimation and model training, ensuring we either have a well-defined solution or adopt approximate methods (like regularization) when the system is ill-posed.