ML Case-study Interview Question: Predicting Reliable Rideshare ETAs Using Tree-Based Classification
Browse all the ML Case-Studies here.
Case-Study question
You are designing a system for a rideshare company to display accurate pickup ETAs to riders before they request a ride. You need to ensure that the displayed ETA is reliable, meaning there is a high chance the driver will actually arrive within that quoted time window. How would you build a machine learning pipeline to:
Estimate this reliability.
Decide which ETA to show.
Continuously monitor and improve your model’s performance over time?
Provide a high-level approach, then describe how you would handle:
Uncertainty in driver availability.
Marketplace fluctuations (supply-demand shifts).
Model training and inference at scale.
Monitoring and retraining to address performance drift.
Detailed Solution
Overview of the Approach
Start with a classification model that predicts the probability of on-time arrival for a set of possible ETAs. Use tree-based methods (for example, gradient boosting) since they handle complex interactions well, need minimal preprocessing, and can be deployed efficiently in production.
Defining the Reliability Metric
Reliability measures whether the driver actually arrives on or before the predicted ETA, within a small buffer. A simple representation of this core concept is shown below.
R = Probability(actual_time <= predicted_time + threshold)
Here R is the probability of on-time arrival, actual_time is the driver’s true arrival time once the ride is requested, predicted_time is the displayed ETA, and threshold is a tolerance window (for example 1 minute).
Model Inputs
Use features describing driver availability, marketplace signals, historical metrics, and pickup location details. For every possible ETA option, pass these features into the classification model to get a reliability score.
Training Strategy
Generate training labels by comparing historical actual arrival times with known ETAs. Duplicate each ride record for every potential ETA bracket (for example 1 through 10 minutes) to ensure the model learns patterns across all possibilities. This avoids feedback loops and gives the model coverage of every candidate ETA.
Inference and ETA Selection
At prediction time, evaluate all possible ETAs. Discard ETAs if the corresponding reliability score is below some Service Level Agreement threshold. The system then shows the best ETA that meets or exceeds the threshold. This balances speed and accuracy by ensuring only ETAs with acceptable reliability appear.
Example Python Snippet
Below is a minimal illustration of training a gradient boosting classifier. Explanations follow right after.
import xgboost as xgb
import numpy as np
import pandas as pd
# Suppose df has columns: features..., label
# label=1 if actual_time <= predicted_time + threshold else 0
X = df.drop(columns=["label"])
y = df["label"]
dtrain = xgb.DMatrix(X, label=y)
params = {
"objective": "binary:logistic",
"eval_metric": "auc",
"max_depth": 6,
"eta": 0.1
}
model = xgb.train(params, dtrain, num_boost_round=100)
# Then for inference, we pass each possible ETA's feature vector into the model
This code trains a basic gradient boosting classifier on a dataset with reliability labels. df would include repeated rows for each ride, each row corresponding to a different hypothetical ETA bracket.
Monitoring and Retraining
Monitor AUC over time to watch for degradation. Track changes in supply-demand patterns, driver behavior, or new app updates that could cause drift. If reliability drops, retrain on fresh data. Automated pipelines can periodically trigger new model builds to keep predictions aligned with real-world conditions.
Under-the-Hood Details
Tree-based classifiers split on features such as historical acceptance rates, driver distances, regional supply, or time of day. They capture non-linear interactions, for instance how supply constraints interact with local geography. The model outputs a probability score from 0 to 1, indicating the likelihood of on-time arrival.
What if the interviewer asks next:
1) How do you handle driver behavior changes, such as drivers logging off or rejecting requests?
A key idea is to include dynamic features reflecting real-time driver status and distance. At request time, if a driver goes offline, that real-time feature distribution shifts. Continuous retraining on recent data helps the model reflect new patterns of rejections or logoffs. The system can also dynamically adjust the displayed ETA if the driver disappears from the pool right after the ride is requested.
2) Why not use deep learning for this classification task?
Deep learning can handle large-scale tasks involving unstructured data (images or text). For structured data with moderate complexity, tree-based models often match or outperform deep nets and are easier to interpret. They train faster and serve predictions efficiently in real-world pipelines. Maintenance cost is lower, and feature engineering is more straightforward.
3) How do you avoid negative feedback loops from only training on displayed ETAs?
Generate training data by simulating every possible ETA bracket for each ride and labeling them based on the actual outcome. This ensures representation across the entire ETA range, rather than just the historically chosen ETAs. It prevents the model from being biased toward past display decisions.
4) How do you deal with out-of-distribution scenarios, like sudden major traffic disruptions?
Rely on monitoring that flags anomalous changes in reliability scores. If disruptions occur, the reliability model may produce incorrect predictions. Quickly retrain with updated data or apply fallback rules that inflate ETAs in high-uncertainty regions until new training data is available.
5) How do you ensure minimal cancellations even when supply is constrained?
Set a strict reliability SLA so users see realistic ETAs. If the model indicates a high chance of delays, the displayed ETA can reflect that. This might show a longer ETA but reduces rider frustration. Cancelations drop because the user has an accurate expectation of the wait time.
6) Can you explain a detailed method for measuring success and deciding thresholds?
Track AUC to measure raw model ranking performance. Track how many rides meet on-time standards. Evaluate user metrics like cancellation rate or rebooking. Then adjust the SLA threshold so that cancellations stay below target while preserving short ETAs whenever reliability is confident. This is a business tradeoff between user satisfaction and meeting speed demands.
7) How would you extend this to airports or other unique markets?
Add specialized signals that capture airport flow, queue length, or time-of-day flight arrival patterns. Build dedicated sub-models if one-size-fits-all approaches degrade accuracy. Real-time data from specialized areas is critical, since supply/demand can change rapidly at large venues.
8) How do you handle partial driver location data or missing features?
Tree-based methods cope well with missing values. Implement robust data pipelines to fill missing signals with defaults or average values, and let the model’s splits handle any leftover nulls. Also maintain checks in feature pipelines to ensure partial data does not degrade final predictions.
9) What specific complexities arise in real-time systems for reliability prediction?
Complexities include low-latency inference requirements, feature store synchronization, consistent real-time driver location updates, and immediate fallback if features are stale. A well-managed feature store and infrastructure that caches or streams real-time data is crucial for stable predictions.
10) How do you design an online experiment to measure improvements in ETA reliability?
Use an A/B test with a new reliability model powering ETA selections in the treatment group and the existing approach in the control group. Compare metrics like cancellation rate, user feedback, and on-time arrival percentage. If the new system yields better reliability and user satisfaction, roll it out to all users.