ML Interview Q Series: How would you investigate a model drastically underpricing an item based on inventory, demand, and delivery cost?
📚 Browse the full ML Interview series here.
Comprehensive Explanation
Understanding why a pricing algorithm is underestimating a product's price requires analyzing the end-to-end pipeline, from raw data ingestion to the final model output. This includes verifying data quality, revisiting feature engineering steps, evaluating model assumptions, and interpreting how the model combines different signals like availability, demand, and logistics costs.
Checking Feature Definitions and Data Quality
One critical step is verifying that each relevant feature is accurate and up-to-date. For example, if logistics cost is incorrectly reported as a lower number, the model might systematically undervalue the product. It is essential to inspect how each feature is computed and ensure the data pipeline doesn’t introduce errors or missing values.
It can be helpful to visualize the distribution of each feature for underpriced items. For instance, if the underpriced product always has a suspiciously low demand value, investigate whether the demand metric is aggregated correctly.
Verifying the Pricing Function
In many pricing models, the final price might be modeled as a function of factors like availability, demand level, and logistics cost. A simplified representation could be:
where alpha is an intercept capturing any baseline pricing offset, beta_1, beta_2, and beta_3 are the learned coefficients for availability, demand, and logistic_cost respectively.
After examining the formula, check if alpha is biased toward negative or minimal values, or if any coefficient is unexpectedly small. In large-scale systems, each of these terms might be replaced by more complex transformations, but the principle remains the same: confirm that each learned parameter is reasonable.
Investigating the Optimization Objective
Depending on how the model is trained—whether it’s minimizing mean squared error, maximizing revenue, or optimizing for conversion probability—misalignment between the objective function and the actual business goal can cause underpricing. If the model is rewarded for achieving more sales (regardless of margin), it may undervalue the product. Ensuring that the loss function or objective aligns with profit-based metrics is key.
Validating Assumptions and Distribution Shifts
Sometimes, the demand estimates may be outdated, or consumer behavior may have changed, making the original assumptions invalid. If the model was trained on historical data that no longer reflects real-world conditions, it might continue to underprice. Confirm that the training data still represents current market dynamics, and check for distribution shifts: for instance, the demand for this product may have spiked in the last few weeks due to a trend not captured in older data.
Exploring Model Interpretability Methods
Tools such as SHAP or LIME can highlight how each feature influences the final pricing decision. If the logistic cost feature has almost no impact, or the model interprets availability as extremely high when it’s actually low, you get clues about which specific inputs are leading to erroneous predictions. This helps pinpoint if the underpricing is due to a single faulty feature or an interaction among multiple features.
Re-Examining Business Constraints and External Data
Pricing often includes constraints, such as setting a minimum margin or ensuring the product price remains consistent with marketplace competition. If these constraints are missing or incorrectly implemented, the algorithm might systematically undercut the price. If competitive pricing data or real-time market signals are not considered, or are incorrectly weighted, the model might not reflect true market conditions.
Monitoring and Alerting
Deploy mechanisms that flag abnormal pricing outputs before they become widespread. By setting up monitoring thresholds—like checking if the price for any product drops below a certain margin—one can catch anomalies and initiate an immediate diagnostic process to prevent revenue loss.
Follow-up Questions
How would you incorporate domain knowledge regarding shipping costs into this model?
One effective approach is to ensure that the shipping cost feature is carefully engineered to reflect real variations in delivery expenses. For instance, shipping cost might differ drastically by geographical region or expedite level. By creating separate features (like average shipping distance, shipping method, etc.) rather than a single scalar, you can more accurately capture how logistics expenses scale. Ensuring domain experts review these feature definitions is also critical because they can highlight hidden factors such as seasonal surcharges or complex packaging rules that might inflate or reduce cost.
In practice, you might build separate components for each sub-cost, then integrate them into a combined “logistics cost index.” This higher-fidelity measure can lead to a more accurate final pricing recommendation because it reduces the chance of a single low or inaccurate shipping-cost figure driving the model toward underpricing.
How can you handle new or rarely sold products when the historical data is sparse?
When the product is new or sales are infrequent, the model might lack sufficient examples to learn robust pricing patterns. Approaches include using hierarchical models that share statistical strength across similar product categories, or employing transfer learning from more data-rich categories to bootstrap the pricing estimates. Another strategy is to set a fallback rule-based price derived from margin targets or competitor benchmarks until sufficient data accumulates. This prevents the model from issuing drastically incorrect prices due to limited training examples.
What measures would you take to confirm that the underpricing issue has been resolved?
One step is to conduct an A/B test comparing the revised pricing strategy against the old one, evaluating performance metrics like profit margin, conversion rate, and overall revenue. If the new prices yield improved profit margin without sacrificing too many sales, it indicates a correction of the underpricing. It’s also important to monitor any shifts over time to ensure that the fix remains valid under changing market conditions. Additionally, you can inspect post-deployment diagnostic metrics (e.g., how well predicted prices match realized sales data) to verify the model’s reliability.
If the objective is profit maximization but the model is trained on historical cost/demand data, how do you align these objectives in practice?
Aligning the model’s training procedure with true profit maximization can be challenging because the real profit depends on complex factors beyond just cost and demand. One strategy is to simulate or approximate profit outcomes using historical data and incorporate that directly into the training objective. For example, a custom loss function that penalizes predicted prices below a certain profit margin can push the model to avoid underpricing. Regular check-ins with finance and product teams help validate whether these simulated metrics match real-world profit behavior, ensuring that the trained model’s objective and the business objective coincide as closely as possible.
Below are additional follow-up questions
What if there are significant outliers in the cost or demand data that skew the model’s pricing?
Outliers in cost or demand data can drastically pull the model parameters in an unintended direction, especially if the model is sensitive to extreme values. For instance, a single warehouse with an unusually high handling cost can inflate logistic_cost for certain products, reducing the predicted price for all other regions. Similarly, a sudden short-term surge in demand might cause a model to overestimate future interest.
One way to handle this is by examining data distributions at every stage. You can apply robust transformations or truncate values beyond certain thresholds if they don’t reflect realistic conditions. Another approach involves outlier detection methods like isolation forests to identify and filter irregular data points or weigh them differently in the training process. A key edge case to watch out for: sometimes an “outlier” may actually be a harbinger of a permanent market shift (e.g., unexpected but lasting surge in demand). Blindly removing such data could lead to ignoring real and emerging trends.
How would you deal with brand perception or intangible aspects that are hard to quantify numerically?
Certain products command a premium because of brand equity or perceived luxury. If the pricing model only uses features like demand, availability, and logistic_cost, it may fail to capture intangible factors such as brand loyalty, perceived quality, or exclusivity. A pitfall emerges when a product’s brand value is not accounted for: the model may repeatedly underprice it, wrongly concluding that it’s similar to lower-value items.
To address this, you could incorporate carefully engineered features that serve as proxies for brand perception—like customer sentiment scores derived from social media mentions or average product ratings and reviews. Alternatively, cluster products into tiers or categories that reflect brand positioning, and allow the model to learn different baseline price offsets for each cluster. In real-world deployments, watch out for new brands that quickly gain traction—historical data might underestimate their intangible value.
How might rapidly changing competitor prices lead the model to underprice?
If your pricing strategy partially hinges on competitor pricing data and your competitors engage in frequent, dynamic changes—like flash sales—there is a risk of chasing a “race to the bottom.” The model could overreact to transient competitor discounts, leading to a perpetual undervaluation of your products. This becomes especially problematic if your model lacks a mechanism to place a floor on margins or incorporate a brand’s market position.
A robust solution is to introduce time decay in competitor price observations so that ephemeral discount events have limited impact. Another measure is employing business rules that enforce a minimum viable margin. Additionally, you can segment competitor price data by brand equivalence or product similarity to avoid blindly matching prices from lower-quality or lesser-known brands. Watch for edge cases like seasonal competitor promotions that spike demand but only last a few days—your model might incorrectly internalize those temporary drops as long-term trends.
If the product’s supplier landscape changes frequently, how do you ensure accurate cost estimates?
Shifting supplier contracts, raw material volatility, or global events (e.g., shipping delays) can cause logistic_cost to fluctuate rapidly. If your data pipeline relies on static or outdated cost assumptions, underpricing becomes a real risk. A mismatch can occur when your model is unaware that shipping costs have climbed, so it keeps recommending prices that assume cheaper supplier conditions.
One strategy is to maintain a real-time or near-real-time cost feed that updates your model inputs regularly. If the model is retrained less frequently, you might implement an adjustment factor in the final predicted price that accounts for known short-term cost fluctuations. Potential pitfalls include sudden disruptions (like a natural disaster) that rapidly change supply routes, so you need fail-safe checks in place (e.g., a max of logistic_cost from the last 24 hours) to override the model’s outdated assumptions until it can be retrained or recalibrated.
How do you incorporate seasonality or cyclical demand patterns without causing underpricing in off-seasons?
Demand for many products ebbs and flows based on season, special events, or cyclical consumer behavior. If your model’s primary data window doesn’t capture these fluctuations correctly, it might predict lower prices during off-seasons without adjusting back upward when demand rises. This can lead to significant revenue loss during the product’s peak season.
A common approach is to engineer time-based features, such as the month of the year, holiday indicators, or day-of-week demand cycles. More advanced techniques use time-series models that explicitly account for seasonal trends. Potential pitfalls include partial seasonality shifts (like an unseasonably warm winter) and multi-year cyclical patterns (e.g., products that peak every two years). Always verify that the seasonal features reflect the actual consumption cycle; misaligned seasonal data can worsen predictions rather than improve them.
How do you handle personalized pricing without causing fairness or ethical concerns?
Personalized pricing tailors costs to individual customers based on their browsing behavior, purchase history, or demographic data. While this can optimize revenue, it raises ethical and fairness questions if certain groups consistently receive inflated prices. Additionally, legal restrictions on price discrimination may limit your ability to vary prices too broadly.
To mitigate pitfalls, set clear guidelines on what personalization factors are allowable (e.g., loyalty status, product interest) and avoid sensitive attributes that could create bias. A practical approach is to define price ranges or discounts that are permissible for each customer segment, rather than letting the model generate arbitrary figures. You also need a robust compliance check to ensure the algorithm doesn’t inadvertently discriminate. Edge cases include situations where multiple users share the same IP address or device, leading the model to apply a single profile’s pricing to many distinct customers.
In cases where the product is sold with optional add-ons, how do you prevent underpricing the bundle?
Many products come bundled with related accessories or warranty services. If your model focuses only on the base product price, it may not capture the value of optional add-ons. As a result, the model might aggressively lower the base product price in an effort to drive sales, ignoring the potential revenue from upsells.
One solution is to model the total revenue per transaction instead of just focusing on the base product. For instance, you could treat the entire set of products and add-ons as a bundle, then predict the optimal price for the package. Alternatively, you can create a two-stage model: first predict the base product price, then a separate model for add-ons. Watch out for edge cases where a discounted product plus a high-margin add-on yields higher total profit, making it acceptable to underprice the base item. Balancing these complexities requires clarity on which metric—total revenue or base product margin—dominates your pricing objective.
How do you manage partial inventory or pre-order scenarios?
If a product is partially out of stock or available on backorder, standard availability signals might be misleading. The model could interpret low immediate availability as a reason to increase price (assuming scarcity), yet for a backordered product, the delayed shipping could reduce consumer interest. Balancing these effects is tricky. If the model isn’t aware that “out of stock” for a short time doesn’t necessarily mean higher demand, it could produce inflated or deflated prices.
You might introduce a feature indicating whether the product is backorderable, along with estimates of how long customers will wait for shipment. The pricing logic should reflect potential cancellations or lost sales if the wait time is too long. Edge cases arise when a product is unexpectedly restocked sooner than anticipated—your model might hold on to an artificially high price, making sales drop. A well-designed system includes real-time updates to availability data and explicit logic to prevent “stock shock” from unduly affecting prices.
What if your model is highly accurate for most products but drastically wrong for a niche subset?
It’s common for a model to perform well on the majority of items while failing on a small category of niche or specialized products. These products might have unique cost structures, demand patterns, or brand aspects that the model hasn’t captured. This can lead to large pricing errors—often underpricing or overpricing—within that niche.
One solution is to segment your product catalog into clusters, ensuring that niche categories are modeled separately. You can either create specialized sub-models for these clusters or build hierarchical models that account for broad-level behaviors and then refine for category-level nuances. Pitfalls arise if the niche segment is too small to train a separate model robustly. In such cases, it might be more effective to apply a rule-based adjustment or a combined approach where the global model’s prediction is corrected by expert domain rules for that niche. Always examine prediction error metrics per product category to identify these pockets of poor performance early.
How do you balance the long-term brand impact versus short-term profitability in your pricing algorithm?
A purely revenue-optimizing model may propose aggressive price cuts to move inventory quickly. While this can boost immediate sales, it risks eroding a premium brand’s perception over time. Alternatively, consistently inflating prices for short-term gains might alienate loyal customers and reduce lifetime value.
To address this, you can incorporate a longer-term metric into your training or evaluation process, such as projected customer lifetime value or brand loyalty index. The model could be penalized for frequent drastic price changes. A potential pitfall is that long-term metrics are often harder to quantify accurately, requiring advanced forecasting or heuristics. Also, brand-damaging effects might take months or years to materialize, so your model’s short-term success metrics might hide a slow erosion in brand equity. Close collaboration with marketing and product strategy teams is essential to ensure that short-term price optimizations don’t undercut the brand’s long-term positioning.