ML Case-study Interview Question: Personalized Recipe Carousels Using Embeddings and Contextual Bandits

Rohan Paul

Apr 18, 2025

Browse all the ML Case-Studies here.

Case-Study question

A major online platform has a huge catalog of recipes. They want to build a recommendation system that shows the best recipe options for different users. Each user has specific tastes and dietary constraints. The editorial team handpicks certain spotlight recipes. They also want algorithmic recommendation carousels for “Most Popular This Week,” “Recipes Similar to the One Being Viewed,” “We Think You’ll Love,” “Seasonal Recipes in Your Region,” and “Diet-Specific Recipes (vegetarian, vegan, dairy-free, gluten-free).” Propose a full solution for how to build, test, and refine these recommendation carousels. Show the models, features, and practical steps required for implementing this system at scale. Explain how you would incorporate user feedback signals, handle personalization, maintain freshness of recommendations, handle new recipes, and account for changing user behavior over time. Provide all technical details a Senior Data Scientist would need.

Connect with me on X (Twitter)

Detailed Solution

Different carousels serve different needs. The editorial team only curates some limited highlights. The rest depend on algorithmic approaches that combine user-level data with recipe-level features. A multi-step process is needed: define pools of candidate recipes, rank those items through algorithms, and feed results into carousels.

Candidate Pool Creation

A pool is a subset of recipes eligible for each carousel. An editorially curated pool might have recipes manually picked by the editorial team. An algorithmic pool can be formed by querying the recipe database by popularity, freshness, ingredient tags, or seasonality. Examples: Recipes with the highest page views (for “Most Popular This Week”). Recipes with certain diet tags (for diet-specific recommendations). Recipes with top seasonal ingredients (for “In Season Near You”).

Ranking via Embeddings for Similarity

Represent recipes in vector form. Process titles, ingredients, cooking steps, and descriptions to generate embeddings. Use a pretrained sentence transformer or a text embedding model.

To measure how close two recipes are, compute cosine similarity on their embeddings. Higher scores mean closer similarity. Show similar recipes under a “Recipes Similar to the One Being Viewed” ribbon. Adjust the ranking to blend popularity with similarity, so results are both relevant and well-liked.

Bandit Model for Personalized Recommendations

Use a contextual multi-armed bandit model to rank candidate recipes within each carousel. The bandit predicts a reward (like a user clicking a recipe) and learns by balancing exploration of new items with exploitation of known high performers. Include contextual features like:

Similarity to a user’s saved recipes.
Diet preferences inferred from tags in a user’s previously saved items.
Seasonality scores for ingredients in a user’s region.

Below is a simple representation of a bandit reward model with linear features:

Where:

r(x) is the predicted reward (probability of a click).
w_i are learnable weights for each feature.
x_i are the contextual features (similarity scores, popularity, diet alignment, seasonality).

The bandit updates these weights over time to improve its predictions.

Personalization Vector

Build a user vector by averaging embeddings of recipes they have saved or engaged with. For user U, if they have saved N recipes, each represented by embedding E_i, define:

user_vector_U = (E_1 + E_2 + ... + E_N) / N

Compute the cosine similarity between user_vector_U and each candidate recipe vector to produce a personalization feature. Include this feature in the bandit model so it learns how a user’s history affects their click likelihood.

Handling Diet Preferences

Tag recipes with diet labels: vegetarian, vegan, dairy-free, gluten-free. Count the user’s saved recipe tags. Form a diet vector such as [vegetarian_count, vegan_count, gluten_free_count, dairy_free_count]. Normalize it to get a distribution of the user’s diet interests. Similarly represent a candidate recipe’s diet tags in a vector, then compute cosine similarity. Feed that similarity as a contextual feature into the bandit.

Seasonality

Generate a seasonality score by comparing regional produce availability to recipe ingredients. Score each recipe based on how many seasonal ingredients it has at a given time and location. Build a candidate pool of seasonal recipes and then run a bandit model on top to rank by predicted engagement.

Handling Freshness and Exploration

Incorporate an “uncertainty” term that allows the bandit to explore recipes with less historical data. Over time, the bandit updates what items it recommends. Also implement a forgetting factor that de-emphasizes older data when user trends shift.

Data Pipelines and Infrastructure

Store embeddings in a vector database for fast nearest-neighbor lookups. Maintain a standard data warehouse to log user events (saves, page views, clicks). Regularly retrain embeddings to reflect new recipe text and produce updates in the vector database. Keep a real-time inference service that updates bandit predictions with the latest user features.

Example Python Snippet

Below is a rough sketch of how you might generate user vectors and compute recipe similarities:

import numpy as np

# Suppose you have recipe_embeddings dict: {recipe_id: embedding_vector}
# Suppose you have user_saved_recipes dict: {user_id: [list_of_recipe_ids_saved]}

def compute_user_vector(user_id, recipe_embeddings, user_saved_recipes):
    saved_ids = user_saved_recipes.get(user_id, [])
    if not saved_ids:
        return None
    embed_list = [recipe_embeddings[rid] for rid in saved_ids if rid in recipe_embeddings]
    if not embed_list:
        return None
    avg_vector = np.mean(embed_list, axis=0)
    return avg_vector

def cosine_similarity(vec1, vec2):
    dot = np.dot(vec1, vec2)
    norm1 = np.linalg.norm(vec1)
    norm2 = np.linalg.norm(vec2)
    return dot / (norm1 * norm2)

def personalized_score(user_vector, candidate_vector):
    if user_vector is None:
        return 0
    return cosine_similarity(user_vector, candidate_vector)

# Then feed that score into a bandit model that decides ordering

Follow-Up Questions and In-Depth Answers

How do you optimize for diversity instead of just showing popular recipes?

Train the bandit on a modified reward function that boosts items if they align with user preferences and are not shown frequently. Randomize results to maintain novelty. Penalize items that appear too often so the model explores a wider space.

How do you handle cold-start users with no interaction history?

Show popular recipes initially or rely on contextual signals like location or time of year. Over time, collect engagement data. Start building a user vector as soon as they save or rate a recipe. Use short-term session data (recent clicks or hover events) to bootstrap preferences.

How do you keep recommendations updated if user preferences shift?

Apply a forgetting mechanism so older data has diminishing weight. Retrain or fine-tune embeddings periodically. Maintain a rolling window on user interactions so the bandit model captures changing interests. Refresh data pipelines frequently to inject new user engagement signals.

How do you tackle bandit hyperparameter tuning and ensure stable performance?

Perform A/B tests comparing different bandit parameter sets. Monitor click-through rate, conversion rate (saves vs clicks), and recipe diversity. Tune the exploration rate (epsilon or variance terms) based on performance. Track differences across segments. Retrain periodically. Store metrics in a logging system for anomaly detection.

How do you incorporate editorial oversight?

Allow editors to override with manually curated content for special collections or events. Restrict certain items from the bandit if they must be highlighted or withheld. Blend editorial picks with algorithmic suggestions. Provide an admin interface so editorial teams can see recommended items and give feedback.

How do you handle scalability with a growing recipe catalog?

Use embeddings in a vector store that supports efficient similarity searches. Distribute bandit computations across multiple servers. Stream new recipe data into the pipeline. Cache recommendations for popular user segments when real-time computation is not critical. Rely on horizontally scalable infrastructure with load balancing.

How do you evaluate the success of the system?

Measure clicks, saves, user satisfaction surveys, and time on site. Compare personalized carousels to non-personalized baselines. Measure coverage (how many different recipes are shown) and how often users try new recipes. Track retention metrics to confirm that returning users engage more deeply with recommended items.

No introduction. No conclusion.

Rohan's Bytes

Discussion about this post