ML Case-study Interview Question: Predicting Subscription Churn with Gradient Boosting on Behavioral Data.
Browse all the ML Case-Studies here.
Case-Study question
You are the Senior Data Scientist at a subscription-based meal delivery company. Customer retention is crucial for profitability. The Product team notices that customers churn for various reasons, such as changes in personal circumstances, pricing concerns, or competitive offers. The leadership wants a predictive solution that flags high-risk customers before they churn, along with insights on factors driving their churn probability. The Marketing team then uses the predictions to decide on incentives or outreach campaigns.
They want you to:
Propose a detailed approach to predict which customers are likely to churn.
Explain how you will engineer features using only behavioral data (like order frequency, usage patterns, etc.).
Show how you will handle imbalanced data and evaluate model performance.
Determine how Marketing can customize interventions based on the model’s outputs and feature importances.
Demonstrate how you will monitor model accuracy over time and adjust thresholds or re-train if needed.
Detailed Solution
Churn occurs when a customer stops ordering for a set period. Predicting churn involves supervised binary classification. The pipeline starts with historical data on user behaviors, labeling users as “churner” if they have not ordered for a fixed number of weeks. Use features such as order frequency, subscription pauses, pattern of discount usage, and any application usage data. Train a gradient boosted tree model (for example, LightGBM or XGBoost) using this labeled dataset. Extract feature importances and interpret them using a technique similar to SHAP values.
Key Model Formulations
To illustrate a typical classification model, consider logistic regression as a baseline. The predicted probability of churn is often computed as:
Here, x_{1} to x_{n} are features, and beta_{0} to beta_{n} are learned coefficients. A gradient boosted tree model uses decision trees as base learners but still outputs a probability after transforming scores. The principle remains similar: outputs map to a probability of churn.
Explain logistic parameters inline as text if asked. For instance, beta_{0} is the intercept. beta_{1} through beta_{n} are weights for features x_{1} through x_{n}, capturing how strongly each feature pushes the probability of churn up or down. Gradient boosted trees accomplish a similar mapping but through additive combinations of weak learners.
Feature Engineering
Collect only behavioral data like how many boxes the user orders each month, times they pause, recipe categories chosen, and usage patterns on the mobile or web platform. Encode them numerically. Keep the feature set large (for example, 300 features), then run correlation or feature selection to avoid redundancy.
Thresholding and Precision-Recall Trade-offs
Once the model scores each user, choose a threshold to convert probabilities to binary churn predictions. If a user’s churn probability >= 0.5, label them “churner,” else “not churner.” Adjust this threshold based on business goals. Increasing sensitivity (recall) might cause more false positives, meaning more marketing spend on users who might not have churned. Increasing precision lowers false positives but misses some real churners.
Precision is:
Recall is:
Both are crucial. Decide on a good trade-off by looking at the Precision-Recall curve.
Interpreting Feature Importances
For a gradient boosted tree, interpret feature impacts with SHAP values. Red SHAP values might indicate a driver toward churn, while blue SHAP values might suggest retention. Summaries help the Marketing team see which behaviors are correlated with churn. Use a Business Intelligence tool to visualize these results so non-technical teams can filter churners by feature and see how different interventions might help.
Model Deployment and Monitoring
Periodically re-train with the latest data because customer preferences or competitor landscapes shift. Track model performance each week. Compare predicted churners with actual churners who have had no orders for four weeks. If performance drops, revise feature engineering or hyperparameters.
Example Code Snippet
import pandas as pd
import lightgbm as lgb
from sklearn.model_selection import train_test_split
from sklearn.metrics import precision_score, recall_score
# Assume df has columns: 'user_id', 'feature1', 'feature2', ..., 'churn_label'
X = df.drop(['user_id','churn_label'], axis=1)
y = df['churn_label']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
lgb_train = lgb.Dataset(X_train, label=y_train)
lgb_eval = lgb.Dataset(X_test, label=y_test, reference=lgb_train)
params = {
'objective': 'binary',
'metric': 'binary_logloss',
'learning_rate': 0.05,
'num_leaves': 31
}
model = lgb.train(
params,
lgb_train,
valid_sets=[lgb_train, lgb_eval],
num_boost_round=1000,
early_stopping_rounds=50
)
y_pred_prob = model.predict(X_test)
threshold = 0.5
y_pred = [1 if p >= threshold else 0 for p in y_pred_prob]
precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)
print("Precision:", precision)
print("Recall:", recall)
Marketing Intervention
When the model flags a user as high-risk for churning, the Marketing team decides on an intervention, such as a targeted email or discount. Balance the cost of the incentive with the predicted risk. In a more advanced approach, add a Promotion Optimization model that personalizes the incentives.
Performance Tracking
Implement a weekly monitoring dashboard that records how many predictions were correct. Compare predicted churners with actual outcomes four weeks later. Track metrics and adjust thresholds if churn patterns change.
What if the interviewer asks the following?
How do you ensure the model generalizes beyond the historical training data?
Retrain on a rolling window of recent data so the model captures evolving user behaviors. Validate with multiple time-based splits. Watch for data drift in feature distributions. If new behaviors or campaigns emerge, incorporate them into the feature set or retrain more frequently. Avoid building the model solely on older data.
How do you measure cost savings from the churn model?
Compare two groups: one receives interventions based on the model’s predictions, while the other follows a baseline approach (like no interventions or random incentives). Track differences in churn rates, revenue, and cost of incentives. The net saving is revenue gained by retaining customers minus the marketing cost spent on those promotions. Use standard A/B testing methodology.
How do you handle cold-start users who have limited historical data?
Use minimal features that are always present, such as sign-up source or first-week behavior. For advanced solutions, use a hierarchical approach that starts with population-level patterns, then refines as soon as more data arrives for the user. Alternatively, combine user-level features with cluster-level patterns that group similar types of new users.
Why use gradient boosted trees instead of logistic regression?
Gradient boosted trees handle complex non-linear relationships and interactions among features, often with better predictive performance. They automate aspects of feature engineering (like capturing splits that effectively segment churners). Logistic regression is simpler to interpret, but gradient boosted trees often achieve higher accuracy when there are many behavioral features.
How do you tune hyperparameters for gradient boosted trees?
Split the training set into training and validation folds. Use techniques like grid search, random search, or Bayesian optimization. Evaluate models on metrics relevant to your goal (precision, recall, F1 score). For instance, tune num_leaves, learning_rate, and min_data_in_leaf. Stop when performance gains plateau or cross-validation indicates the best hyperparameters.
What is your approach if the model incorrectly labels many loyal users as churners?
Set a higher probability threshold to reduce false positives or investigate the features causing over-sensitivity. Possibly re-balance classes or refine how you define churn. Evaluate if external factors (like sporadic usage patterns) are incorrectly flagged. Seek domain knowledge from Marketing to refine your definition of a churner.
Why do you rely only on behavioral data and exclude demographic or personal data?
Privacy concerns or data protection regulations may limit usage of personal data. Behavioral data often suffices to infer churn probability, as it directly reflects engagement and purchasing patterns. If we have demographic data and the legal basis to use it, we could improve the model, but we must handle user privacy carefully.
How do you avoid overfitting with 300 features?
Check feature importances or correlation. Remove redundant features. Use regularization methods built into gradient boosted frameworks. Validate extensively with cross-validation across different time windows. Track if training metrics are significantly higher than validation metrics.
What if a feature drives churn but cannot be influenced?
Ignore it for interventions if it’s not actionable, but still keep it in the model if it boosts predictive accuracy. For marketing decisions, focus on actionable features. For instance, if “location” is correlated with churn but cannot be changed, it still helps the model classify who is likely to churn, but it will not guide a marketing action plan.
How do you present these findings to non-technical stakeholders?
Use a user-friendly dashboard. Show them churn probabilities over time, the top features driving churn, and aggregated savings from marketing interventions. Display actual churners next to predicted churners each week to illustrate success. Provide interactive filters to let them explore different user segments and see top factors for churn in each segment.