ML Case-study Interview Question: Predicting Grocery Item Availability with Machine Learning and Dynamic Thresholds.
Browse all the ML Case-Studies here.
Case-Study question
You are working at a major online grocery marketplace that connects customers with physical retail stores through on-demand shoppers. During a period of unprecedented demand, the company noticed that many items were going out of stock. This led to a plunge in the rate at which items were found in-store, causing frustrated shoppers, canceled orders, and lower customer retention. The marketplace wants a machine learning solution to predict item availability and dynamically balance displaying more items to customers (selection) against the risk of listing items that might not be found (found rate). How would you design a system to predict and improve found rate, while dynamically handling demand surges, real-time inventory fluctuations, and diverse store conditions?
Outline your approach, detailing how to:
Model the probability of an item being in-stock.
Incorporate changing inventory signals in real time.
Avoid retraining models too often by tuning thresholds that separate which items to display and which to hide.
Implement and evaluate success metrics that keep customer satisfaction high.
Detailed solution
Machine learning powers the probability that an item is in-store. The system combines many signals, such as item popularity and historical shopper feedback about whether items were found or not. Availability is then estimated by a model that outputs a likelihood score. Any item above a certain score threshold is shown as “in stock.” Items below that threshold are either hidden or flagged as “low stock.”
Modeling probability of item availability
A model is trained to output the probability that a particular item at a specific store will be found. The training data includes previous orders, timestamps, store location, observed item availability, and item-level features. It also includes real-time shopper reports. The availability score is typically a number from 0 to 1.
When multiple items go into a basket, the expected found rate is the average of these item-level probabilities. This determines how many items in a given basket are likely to be found once the shopper arrives at the store.
Here, N is the total number of items in the basket, and p_i is the model-predicted probability that item i is in stock.
Real-time inventory signals
Shopper app interactions provide critical signals. For instance, when a shopper marks an item as unavailable, that feedback updates the model’s features or feed into a real-time system that recalibrates the availability score. This rapid feedback loop helps capture sudden inventory changes. The system can adjust how it displays items if real-time data shows a shortage.
Dynamic thresholds
One threshold might apply to normal conditions, but sudden demand surges or disruptions require prompt adjustments. Instead of retraining the entire model each time, a set of dynamic thresholds can shift which items qualify as “high-likelihood available” or “low-likelihood available.” This recalibrates the selection vs found rate trade-off. Different thresholds can also be set by category or retailer, reflecting variability in supply chain stability.
Selection vs found rate trade-off
Exposing more items can drive higher basket sizes but risks frustrating users if found rates drop. Aggressive thresholds hide more items with uncertain availability, preserving a high found rate but hurting item selection. Teams test multiple threshold values against historical data to find the sweet spot. They evaluate user satisfaction metrics, such as the fraction of canceled orders, to confirm the threshold is optimized.
Impact on user retention
A consistently high found rate leads to repeat orders. The system is evaluated by tracking how often users reorder within a certain time window after receiving a basket with minimal out-of-stocks. Experimental designs (e.g., A/B tests) measure the found rate’s impact on repeat purchase rates. The goal is to sustain confidence that items shown will be found.
Engineering infrastructure
The end-to-end pipeline ingests signals from many sources: inventory logs, shopper interactions, store catalogs, and historical item orders. An online feature store updates the model with near-real-time signals. A service then applies dynamic thresholds and surfaces items with final availability scores to the front-end. The same service updates “running low” or “request item” labels, depending on the item’s probability of being in stock.
Example Python snippet for threshold adjustment
current_threshold = base_threshold
def dynamic_threshold_adjustment(recent_found_rate, recent_selection):
if recent_found_rate < desired_found_rate:
return base_threshold + 0.05
else:
return base_threshold - 0.05
# Main logic
recent_found_rate_value = compute_found_rate_last_day()
recent_selection_value = compute_selection_size_last_day()
current_threshold = dynamic_threshold_adjustment(recent_found_rate_value, recent_selection_value)
The logic fetches daily found rate data and selection size. If the found rate is too low, the threshold is raised, restricting items. If the found rate is high, it lowers the threshold to display more items. This is a simplified illustration, but in production, various segments (stores or item categories) might have specialized thresholds.
What if the interviewer asks…
How do you handle new items or items with little data?
Explain how the model uses item and store embeddings or a cold-start component. For a new item, the system looks at store-level features and product-level similarities. Items are grouped into categories or families to inherit baseline probabilities based on comparable products. As the system gets real-time signals from first orders or shopper feedback, it refines the item’s score.
What about extreme volatility or crises?
Explain how thresholds are quickly raised or lowered to reflect an abnormally high out-of-stock scenario. When supply chain disruptions happen, the team switches to a more conservative threshold to preserve found rate. Show how the system learns from each shopper feedback loop, so once the disruption eases, thresholds revert to normal. Parallel data pipelines might track external events like regional weather or reported supply issues, factoring them into the availability model.
Why not demand a perfect found rate for all items?
Perfect found rate forces the system to hide many items that might actually be in stock, reducing variety and driving customers away. The sweet spot is found through experimentation where found rate remains high enough that users trust the platform, but item selection remains broad. Include real metrics or experiments proving that zero out-of-stocks is not feasible or beneficial.
How do you evaluate your approach?
Track out-of-stock cancellations, shopper feedback, re-order rates, and net promoter scores. Link changes in threshold strategy to retention metrics over time. If retention drops after a threshold tweak, the system re-adjusts. Also consider store-by-store or city-by-city performance analysis, because thresholds that work in one area might not work in another.
How do you scale the system across many retailers?
Use a feature store that harmonizes incoming signals from various retailer integrations. Keep a flexible pipeline that can swap in new data feeds or thresholds for each retailer. Introduce category- or region-specific thresholds if certain stores have consistent supply chain performance while others experience more volatility.
How do you justify the infrastructure costs?
Compare the cost of real-time data pipelines and threshold logic to the expense of losing customers from repeated out-of-stocks. Show that better found rates often mean higher conversion rates. Use cost-benefit analyses that weigh engineering resources against improved customer lifetime value. If the real-time service is expensive, consider partial solutions where only the most volatile categories get updated in near real time.
What if real-time signals conflict with historical data?
Weigh signals from historical patterns versus sudden changes. Apply a decay factor to older data when new real-time signals arrive. The system can do a weighted average that emphasizes the latest feedback from shoppers while still leveraging the broad historical pattern. Calibrate how quickly the model forgets outdated observations to reduce overreaction.
Would you consider any hardware innovation?
Potentially explore shelf sensors and other in-store hardware. If a large enough partner invests in direct store inventory tracking, real-time data would become more reliable. For many retailers, that is still unrealistic, so the short-term approach relies on shopper feedback. Over time, direct integration with retailer point-of-sale systems or shelf sensors might refine availability predictions.