ML Interview Q Series: How would you describe the core principle behind Linear Discriminant Analysis (LDA) and what are some scenarios where LDA is applied in real-world practice?

May 05, 2025

📚 Browse the full ML Interview series here.

Comprehensive Explanation

Linear Discriminant Analysis (LDA) is a classical algorithm in machine learning that seeks to find a linear combination of input features that best separates multiple classes. It does so by maximizing the ratio of between-class variance to within-class variance, thereby attempting to project data onto a lower-dimensional subspace where class separability is enhanced.

Connect with me on X (Twitter)

One of the most central formulas in LDA is where we identify a projection vector w that maximizes the separation between classes. This can be written as:

Here w is the projection vector (in plain text, w is a vector in R^d if we have d-dimensional data). S_B is the between-class scatter matrix, which measures how separated the different class means are from the overall mean. S_W is the within-class scatter matrix, which measures how much the samples of each class differ within that class. The ratio quantifies how well we can find a direction (vector w) that maximizes class separation. In practice, one typically solves this optimization by computing inverse(S_W)*S_B and finding eigenvalues and eigenvectors.

LDA assumes data from each class is generated by a Gaussian distribution with a shared covariance matrix but different class means. This implies that within each class, the data is somewhat normally distributed, and across classes, there is only a shift in the mean. In addition, LDA often serves as both a dimensionality reduction tool and a classifier. When used as a classifier, LDA fits a Gaussian distribution per class and assigns new points to the class with the highest posterior probability.

There are several ways to apply LDA in practice. It can be used for classification when the classes are fairly well separated by linear boundaries, such as in certain medical data analyses, text classification for topic prediction when classes are linearly separable in a feature space, or in combination with other algorithms (e.g., as a dimensionality reduction step before another classifier).

It is also worth noting that LDA, unlike Principal Component Analysis (PCA), takes label information into account. While PCA focuses on capturing the directions of maximal variance in the data overall (ignoring labels), LDA specifically aims to find directions that separate one labeled class from another.

When dealing with multiple classes, LDA finds multiple projection directions. Specifically, with k classes, LDA can produce up to k - 1 meaningful discriminant directions, because that is the maximum rank of the between-class scatter matrix.

Below is a simple Python snippet to illustrate how to use LDA for dimensionality reduction and classification using scikit-learn:

from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_iris
from sklearn.metrics import accuracy_score

# Example dataset
data = load_iris()
X = data.data
y = data.target

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create LDA model
lda = LinearDiscriminantAnalysis(n_components=2)

# Fit LDA model
lda.fit(X_train, y_train)

# Project the data to lower dimensional space
X_train_lda = lda.transform(X_train)
X_test_lda = lda.transform(X_test)

# Predict on test set
y_pred = lda.predict(X_test)
print("Accuracy:", accuracy_score(y_test, y_pred))