ML Case-study Interview Question: Bayesian Thompson Sampling for Product Ranking: Addressing Bias and Sparsity

Rohan Paul

Apr 22, 2025

Browse all the ML Case-Studies here.

Case-Study question

A major online retailer maintains a massive catalog with millions of products. The goal is to rank these products so that first-time visitors see items with the broadest overall appeal. Historical product performance data is distorted by position bias: items shown in top positions get more orders regardless of their intrinsic appeal. Many products have limited or zero ordering history, creating more uncertainty in estimates. The retailer wants a system that:

Connect with me on X (Twitter)

Identifies each product’s intrinsic popularity, adjusting for position biases and other factors (device type, filtering, etc.).
Combines multiple signals (click, cart-add, order) to reduce reliance on sparse order data.
Dynamically updates estimates and adapts to new data without retraining on the entire historical dataset each time.
Uses an explore-exploit ranking strategy to uncover potential top-performers while still prioritizing best-known products.

Propose a detailed approach to solve this. Outline your methodology, explain the algorithms, demonstrate how you would handle position bias, data sparsity, and incorporate Bayesian updating. Finally, propose how to rank items in real-time for every new visitor.

Proposed Detailed Solution

Modeling Intrinsic Appeal

Start with a logistic regression model that factors in product-specific effects and position-specific effects. Represent the probability that a customer will order product i at position j as a function of a product effect plus a position effect.

Right side: beta_i is the product-specific effect capturing the item’s intrinsic appeal, and alpha_j is the position effect capturing how visibility influences likelihood of an order.

Historical click and order patterns drive estimation. Observed order rates at different positions separate product-specific appeal from position bias. In practice, add more factors: device type, filters, and so on.

Incorporating Multiple Signals

Many products have scant order data. Add intermediate actions (click, add-to-cart) to refine estimates:

P(order∣shown)=P(click∣shown)×P(add to cart∣click)×P(order∣add to cart)

This chain rule separates the steps leading to an order. Each part uses a smaller logistic regression that updates based on user behavior. If cart-add or order rates are unreliable for a new product, clicks alone still guide partial ranking decisions.

Bayesian Estimation and Updating

Bayesian estimation uses prior distributions over product effects. This stabilizes estimates for items with scarce data. For example, no item should jump to the very top for a single successful impression. Each day or hour, incorporate new user actions to form a posterior distribution for each product. Maintain an autoregressive “forgetting” scheme so the model remains flexible to shifts in product popularity and seasonality, without discarding older data entirely.

Ranking and Thompson Sampling

Purely exploiting the highest posterior means might overlook hidden gems. Thompson sampling balances exploitation and exploration. Sample from each product’s posterior. Rank items by their sampled values each time a user arrives. This produces slight variations in item positions, gathering new feedback to refine the model.

Engineered this way, the system shows high-appeal products on top most of the time but also tests alternatives. Over time, the posterior distributions tighten, and truly appealing products stand out.

Practical Implementation Details

Use Python for quick iteration. Libraries such as pystan (or other probabilistic frameworks) streamline posterior estimation. Store logistic regression coefficients and covariance matrices. Update them incrementally each day or in real-time. Rerank on page-load or query time by sampling from the posterior distributions, then sorting accordingly. Collect new impressions, clicks, and orders, which become tomorrow’s input.

Leverage data pipelines that track user actions at scale. For big catalogs, index product-level data in a scalable store. Use a distributed environment to handle real-time queries efficiently.

Follow-up Question 1

How would you handle the risk that your Bayesian model might not converge quickly enough for products with extremely volatile or trending popularity?

The main challenge is that older data can mislead estimates for items experiencing sharp shifts in popularity. Slower-moving priors can delay correct updates.

Answer and Explanation Shorten the forgetting horizon to adjust more aggressively. Increase the weight of recent data in the posterior update. Place more flexible priors on the product effects if certain categories have rapidly changing trends. Monitor items with unusual variance or a sudden surge in signals, and boost their exploration rate to gather more data quickly. In a production system, keep a dynamic schedule that speeds up updates for high-volatility items. For instance, if a product experiences massive changes in click-through rates, the system triggers accelerated Bayesian updates for that product alone.

Follow-up Question 2

How would you detect when a new product under the current approach is showing consistently high add-to-cart rates but a suspiciously low conversion to actual orders?

Some products might appear to have strong appeal but get abandoned at checkout. This can skew the model if not addressed.

Answer and Explanation Compare cart conversion rates (orders / cart-adds) for each item against the average for similar categories or subcategories. If a product has an unusually low ratio, recheck factors like price changes, shipping details, or issues discovered at checkout. Introduce a penalty factor that lowers the final ranking for items with low cart-to-order conversion. Adjust the logistic regression to weigh the final ordering step more heavily. If we see consistently weak transitions from cart to purchase, feed negative updates into that product’s posterior distribution so its overall order probability is reduced accordingly.

Follow-up Question 3

What if you want to add extra context, like detailed product attributes or shopper attributes, into the model?

The business might want to personalize or highlight new items.

Answer and Explanation Incorporate product metadata (brand, style, price range) in a feature vector. For user-level personalization, add shopper features (geolocation, browsing history). Switch to a hierarchical Bayesian model or a factorization machine approach. If you suspect nonlinearities, kernel-based methods or neural networks could capture interactions. Adjust the logistic regression to expand the predictor space with cross-terms. Rely on partial pooling: products sharing certain attributes will have correlated priors, helping new items that lack historical data.

Follow-up Question 4

How would you verify that Thompson sampling is truly beneficial versus a simpler epsilon-greedy strategy in this ranking context?

The company needs evidence that the explore-exploit method yields tangible gains.

Answer and Explanation Run controlled A/B experiments comparing Thompson sampling with an epsilon-greedy baseline. Track average order rates, click-through rates, and coverage of the catalog. Check how often rare items get shown and how many eventually emerge as top sellers. If Thompson sampling outperforms by discovering better products faster and maintaining strong overall performance, then you have empirical proof. Look at engagement metrics and the pace at which new items find traction. If the difference is negligible, revisit hyperparameters and data scale. Sometimes simpler strategies may suffice in smaller catalogs or stable environments.

Follow-up Question 5

How do you ensure efficient real-time ranking while updating potentially millions of posterior parameters?

The engineering challenge is huge.

Answer and Explanation Cache partial model outputs. Maintain a separate high-speed service that stores the latest posterior parameters. When a user arrives, sample product effects from a fast random generator pre-seeded with the updated means and variances. Sort them for the page. Use approximate sampling methods or low-rank matrix factorizations to reduce overhead. For large catalogs, maintain a stream processing architecture. Break updates into incremental batches so the main service only merges small parameter changes. This keeps daily or hourly updates feasible without blocking real-time recommendations.

Rohan's Bytes

Discussion about this post