📚 Browse the full ML Interview series here.
Comprehensive Explanation
Diagonalizing a matrix typically involves finding a suitable change of basis in which the transformation represented by the matrix acts as simply as possible on its basis vectors. This simpler form is a diagonal matrix, which only has nonzero entries on the main diagonal and zero everywhere else. In more concrete terms, the process is grounded in the concept of eigenvectors and eigenvalues.
When you have a square matrix M of size n x n, you first identify its eigenvalues (which can be real or complex), and then, for each eigenvalue, you determine the associated eigenvectors. The matrix M is said to be diagonalizable if you can find a full set of n linearly independent eigenvectors. These n eigenvectors form the columns of a matrix P, while the corresponding eigenvalues form the diagonal elements of a diagonal matrix D. The core mathematical relationship is:
where M is the original matrix, P is the matrix whose columns are the n linearly independent eigenvectors of M, and D is the diagonal matrix whose main diagonal entries are those eigenvalues corresponding to the eigenvectors in P.
If such a decomposition can be found, then M is said to be diagonalizable. The process for diagonalizing a matrix can be summarized through these steps (conceptually, without bullet points or itemization):
You find the eigenvalues of M by solving the characteristic polynomial det(M - lambda I) = 0 in a standard text-based format. The solutions for lambda are the eigenvalues. Then you determine the eigenvectors for each eigenvalue by solving (M - lambda I)x = 0 for each lambda. You check the dimension of each eigenspace to ensure that, in total, you can extract n linearly independent eigenvectors. If you can collect exactly n linearly independent eigenvectors, you arrange them as columns of P. The diagonal matrix D is then the matrix whose diagonal entries are the eigenvalues in an order matching the columns of P.
If you cannot find n linearly independent eigenvectors, you cannot represent the matrix in a purely diagonal form, and instead you might only be able to bring it to a Jordan normal form (which is block-diagonal with Jordan blocks).
It can happen that a matrix has repeated eigenvalues (i.e., some eigenvalues appear more than once). In such cases, you need to ensure that each eigenvalue’s geometric multiplicity matches its algebraic multiplicity. That means you should have as many linearly independent eigenvectors as the eigenvalue is repeated. If this condition fails for at least one eigenvalue, the matrix cannot be diagonalized.
Practical Implementation in Python
Below is a brief Python snippet (using NumPy) that demonstrates how to obtain the matrices P and D for a given matrix M. This method is straightforward, but it is important to remember that floating-point precision issues sometimes make the identification of exactly linearly independent eigenvectors a bit tricky in practice.
import numpy as np
# Example matrix
M = np.array([[4, 1],
[0, 3]])
# Compute eigenvalues and eigenvectors
vals, vecs = np.linalg.eig(M)
# Construct the diagonal matrix D
D = np.diag(vals)
# P is the matrix of eigenvectors
P = vecs
# Verify M = P * D * P^-1
M_reconstructed = P @ D @ np.linalg.inv(P)
print("M:")
print(M)
print("Eigenvalues:", vals)
print("Eigenvectors (columns of P):")
print(P)
print("Reconstructed M from PDP^-1:")
print(M_reconstructed)
In this snippet, vals holds the eigenvalues. The columns of vecs form the eigenvectors that serve as P. You then form the diagonal matrix D with the eigenvalues along the diagonal. Multiplying P, D, and the inverse of P should return a matrix extremely close to M (up to numerical precision).
Potential Pitfalls and Additional Insights
A matrix with distinct eigenvalues is always diagonalizable because each distinct eigenvalue will generate at least one linearly independent eigenvector. However, if an eigenvalue is repeated, you need to confirm the presence of a sufficient number of eigenvectors corresponding to it. If the matrix is real but has complex (non-real) eigenvalues, you may still diagonalize it in the complex field, provided you can find a complete basis of eigenvectors in the complex space.
The matrix P used to diagonalize M is not unique. Different orderings of eigenvalues and the corresponding eigenvectors will still yield a valid diagonalization, only changing the order of the diagonal entries in D and the columns in P.
What if the matrix is not diagonalizable in the real field?
This scenario can arise when the matrix has complex eigenvalues but the problem domain requires a strictly real approach. You can move to the complex field to diagonalize it if all eigenvalues are distinct or if you have a complete set of eigenvectors there. Otherwise, you may only be able to bring the matrix to a Jordan normal form. This is especially pertinent for matrices that have repeated eigenvalues but not enough eigenvectors to span the real space.
Why do repeated eigenvalues require special attention?
When an eigenvalue is repeated (for instance, if lambda appears multiple times as a root of the characteristic polynomial), you need to verify how many linearly independent eigenvectors you get from that eigenvalue. If an eigenvalue has an algebraic multiplicity k (meaning it appears k times as a root) but you can only find m < k linearly independent eigenvectors for it, then the matrix cannot be diagonalized. This usually happens when the matrix is defective, resulting in an inability to diagonalize it even though you can still proceed to construct its Jordan normal form.
How to handle numerical issues in diagonalization?
In real-world computations, numerical errors can cause difficulties in accurately detecting eigenvectors and eigenvalues. Floating-point round-off can make an eigenvalue that appears twice look like two close distinct values. Additionally, nearly linearly dependent vectors might still pass a rank check. Robust libraries and stable numerical algorithms (like those in NumPy, SciPy, or other linear algebra toolkits) help to mitigate these issues, but in sensitive applications, additional checks on conditioning and numerical stability are common practice.