ML Interview Q Series: How can Logistic Regression be applied as a supervised learning algorithm for classification?

Mar 25, 2025

📚 Browse the full ML Interview series here.

Comprehensive Explanation

Logistic Regression is a statistical method often used in supervised learning to predict the probability of a binary outcome. Unlike linear regression, which predicts a continuous value, Logistic Regression outputs a probability score between 0 and 1. This probability can be converted into class labels (e.g., 0 or 1) by applying a threshold (commonly 0.5). The core intuition is that although we use a linear combination of input features, we map this linear combination into a [0,1] range using the logistic (sigmoid) function, making it suitable for binary classification.

Connect with me on X (Twitter)

Model Formulation

Consider a dataset of N training samples. Each sample has features x in R^d (i.e. x is a d-dimensional vector). We also have a binary label y in {0, 1}.

We compute a linear function z = w^T x + b. However, we do not directly use z as our prediction. We first apply the logistic (sigmoid) function:

Where z = w^T x + b. The vector w is the weight vector (learned parameters) and b is the bias (scalar offset). The function sigma(z) squeezes z into the range (0, 1). This output is interpreted as the probability that y = 1 given x.

Cost Function (Binary Cross-Entropy)

To learn the parameters w and b, we seek to minimize the difference between the model’s predicted probabilities and the actual labels in the training data. The loss function commonly used is the Binary Cross-Entropy (also called the log loss). For N data points, the cost function J can be written as:

where:

y_i is the actual label (0 or 1) for the i-th training sample.
hat{y}_i = sigma(z_i) is the predicted probability for the i-th sample.
theta is the set of parameters, which includes w and b.

The gradient of J with respect to w and b is computed, and iterative optimization algorithms such as Gradient Descent (or variants like Stochastic Gradient Descent, Adam, etc.) are used to find the parameter values that minimize J.

Decision Boundary and Classification

Once trained, the model produces a probability hat{y} in (0,1). We typically convert this probability to a label by applying a threshold T (commonly T=0.5). If hat{y} >= 0.5, we predict label 1; otherwise, we predict label 0. By changing T, we can trade off between precision and recall, adapting to different requirements of a task.

Practical Implementation in Python

Below is an example using scikit-learn:

import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Example dataset
X = np.array([[0.1, 0.2],
              [0.3, 0.4],
              [1.0, 0.8],
              [0.9, 1.1],
              [2.0, 2.1],
              [2.2, 2.3]])  # feature matrix (N x d)

y = np.array([0, 0, 1, 1, 1, 1])  # binary labels

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Instantiate and train the Logistic Regression model
log_reg_model = LogisticRegression()
log_reg_model.fit(X_train, y_train)

# Predict on the test set
y_pred = log_reg_model.predict(X_test)

# Calculate accuracy
print("Predicted labels:", y_pred)
print("Accuracy:", accuracy_score(y_test, y_pred))

Advantages and Practical Considerations

Logistic Regression is widely used because it is fast, has a probabilistic interpretation, and often works well as a baseline for many classification tasks. Some notable considerations include:

Feature Engineering: Logistic Regression’s performance can be improved significantly by good feature selection and data preprocessing (such as scaling).
Regularization: In practice, one typically applies L2 or L1 regularization to prevent overfitting. Regularization can be seen in libraries through parameters like C in scikit-learn (the inverse of regularization strength).
Multiclass Extensions: For multiple classes, one can extend Logistic Regression using schemes like one-vs-rest or multinomial logistic regression (also supported in many libraries).

Follow-up Questions