ML Case-study Interview Question: Combating Inaccurate Store Status: ML Predicts Closures via Driver Reports & Photos

Rohan Paul

Apr 13, 2025

Browse all the ML Case-Studies here.

Case-Study question

A restaurant marketplace platform faces a problem of inaccurate store operational status. Drivers arrive at physically closed stores but see them marked as open in the system. This leads to canceled orders, wasted resources, and negative user experiences. When a driver reports a store as closed (accompanied by a photo), the system must decide whether to cancel the order, pause the store, or reassign the order to another driver. Propose a data-driven solution to address these inaccurate reports at scale, minimize canceled orders, and reduce manual review costs. Include details on data collection, feature engineering, model development, and decision-making strategies.

Connect with me on X (Twitter)

Proposed solution

Problem framing

The goal is to compute the probability that a store is closed given a driver-reported closed-store event. This probability guides actions such as canceling an order, pausing a store, or reassigning to a new driver.

P(StoreClosed | DRSC) is the probability that the store is actually closed given a driver report. DRSC is a "driver reports store closed" signal.

Data labeling

Historical reports, merchant responses, and driver outcomes form labeled examples. When a driver reported closure and another successful pickup happened soon after, label the report as likely invalid. If no successful pickups occurred and the merchant was unresponsive, label the report as likely valid closure.

Feature engineering

Each report includes store_id, driver_id, timestamp, and an image. Historical behaviors are extracted from store-level signals (recent successful pickups, previous closure reports), driver-level signals (accuracy history of reports, frequency of prior incorrect reports), and an image-based signal (probability indicating the storefront looks closed). An image classifier processes the uploaded photo to produce a single numeric feature representing the likelihood of seeing a closed sign or dark storefront.

Model approach

A single LightGBM model consumes these features and outputs the probability of a store being closed. Higher probabilities suggest the store is likely closed. Decision thresholds partition the 0 to 1 probability range into separate actions:

Low probability: Mark the report as invalid. Cancel the driver’s involvement but keep the store open. Reassign another driver.
Intermediate probability: Cancel the order. Do not pause the store.
High probability: Cancel the order and pause the store.

Example training code:

import lightgbm as lgb

train_data = lgb.Dataset(X_train, label=y_train)
params = {
    "objective": "binary",
    "learning_rate": 0.05,
    "num_leaves": 31
}
model = lgb.train(params, train_data, 100)
y_pred = model.predict(X_test)

In a production system, the model runs after a closure report is filed, evaluates the probability, and triggers the corresponding action based on predefined thresholds.

Implementation details

The driver app prompts for a photo when a store is suspected closed. Once the image is uploaded, the model ingests the event data and the image-based feature from the classifier. The system updates order status and store availability based on the model’s output. The merchant is contacted in parallel to confirm if the store is really closed or open.

Future improvements

A dynamic loss function can incorporate cost-sensitive factors such as time of day, merchant volume, and driver availability. It automatically adjusts thresholds for different scenarios, aiming to reduce expected cancellations and reassignments.

Follow-up question 1

How would you handle noisy labeling when building a binary classifier to identify store closure probability?

A reliable label comes from inferring store status after the closure report. If there are many missed signals (like a driver incorrectly taking a new order at the same store), the label might be noisy. One approach is to collect multiple signals over time. If the merchant is unresponsive, or no orders succeed within a short window, label the store as closed. If subsequent drivers fulfill orders, label it as open. This multi-source approach filters out outliers. Active learning or semi-supervised techniques can also help. For example, a small subset of ambiguous cases can be manually verified to improve labeling quality for model retraining.

Follow-up question 2

What if the image classifier mistakenly labels a bright nighttime photo as closed?

The final decision should not hinge on any single feature. It should be a weighted combination of store features, driver features, and image signals. If the image is borderline, the model can rely on other data (recent successful pickups, historical accuracy of the driver, or an official merchant status update). If the image classifier is frequently fooled by nighttime lighting conditions, retrain with more diverse nighttime samples or apply data augmentation to improve robustness.

Follow-up question 3

How do you decide on the thresholds for the low, intermediate, and high probability ranges?

The thresholds stem from the business cost of misclassification. Missing a legitimate closure wastes resources for drivers and upsets customers. Wrongly pausing a store also loses potential revenue. A validation set with known outcomes can be used to simulate decisions at various threshold combinations. Compute the expected cost for each threshold partition and pick the one that optimizes total cost. The approach typically involves grid search on possible threshold splits, measuring metrics like precision and recall along with estimated operational costs.

Follow-up question 4

How would you approach the dynamic thresholding idea to reduce unnecessary manual reviews?

A dynamic approach involves a cost function that varies with time of day or predicted store volume. If a store typically receives high demand in the next hour, the cost of an incorrect closure is higher, so the threshold for pausing the store is higher. If the store has negligible upcoming demand, the threshold might be lower. Incorporate these factors into the loss function:

cost_of_closing_while_open
cost_of_leaving_open_while_closed
future_traffic_expectation A model or a direct functional mapping can adjust threshold logic in real time, always seeking to minimize expected cost.

Follow-up question 5

How do you ensure your model stays current with changes in store behavior over time?

Periodic retraining or continuous learning is essential. The system should monitor data drift, such as sudden changes in store operating hours or driver reporting patterns. Maintain an automated pipeline to capture recent labeled data, retrain the model, and run performance checks on validation sets. If performance metrics degrade, trigger a retraining job or adjust hyperparameters. Test new models through A/B experiments to ensure improvement over the baseline.

Rohan's Bytes

Discussion about this post