ML Case-study Interview Question: Fine-Tuning GenAI & RAG: Enhancing Food Platform Catalogs, Search, and Support Automation.

Rohan Paul

Apr 21, 2025

Browse all the ML Case-Studies here.

Case-Study question

A large-scale online platform handling millions of food orders daily wants to integrate generative AI for catalog enrichment, review summarization, neural search, and automation. They have set up a dedicated task force to understand how to deploy these models to improve user experience, reduce operational costs, and strengthen brand engagement. They are cautious about potential pitfalls like model hallucination, latency constraints, governance challenges, data security, and privacy. You are tasked with proposing a systematic approach to evaluate and implement generative AI for these applications. How would you frame your solution strategy, ensure risk mitigation, and deliver meaningful outcomes for the company’s stakeholders?

Connect with me on X (Twitter)

Proposed In-Depth Solution

The company formed a specialized generative AI task force to identify high-impact projects. They looked at approaches for image generation, text enrichment, review summarization, and real-time neural search. They also looked at automation of support channels. They used a demand-risk framework to select which use cases to pursue first. High-demand and low-risk problems got immediate priority.

They started with content generation for the food catalog. They aimed to fill the gaps in dish imagery by generating realistic images. They tested three primary methods: text-to-image generation, image-to-image adaptation, and image blending. Text-to-image pipelines struggled with Indian dishes because prompts often failed to generate realistic visuals. Image-to-image showed some gains but still lacked consistency. They had more success with an image blending pipeline that combined a foreground of the dish with a background matching the restaurant’s style. To handle Indian dish nuances, they fine-tuned Stable Diffusion using Low-Rank Adaptation (LoRA). This specialized approach made the pipeline more reliable at generating region-specific dishes with correct appearance and positioning. They also built an in-house out-painting pipeline to correct image aspect ratios without distortion.

They used a generative text pipeline to create short, helpful item descriptions. They relied on extra metadata such as dish taxonomy, then fed these inputs to a large language model. A human checker validated the outputs for accuracy. They also tested generating a condensed review summary widget. This used a Large Language Model to extract key highlights from many user reviews. Their A/B tests on a small restaurant population showed fewer cancellations and claims, indicating better user alignment with dish expectations.

They tackled user decision fatigue by automatically compiling short promotional videos from brand images. They used an internal pipeline that cleans extraneous text from source images and stitches them into short, compelling clips. Their experiments found that thirty-second videos resonated best with users, driving engagement on the menu page.

They explored neural search to let users find the right dish or cuisine in natural language. They built an embedding-based pipeline to capture query intent and dish context. These embeddings aimed to handle multiple-intent queries that traditional keyword search missed. They deployed a prototype model and observed its strengths but identified areas for improvement in real-time latency.

They also automated restaurant partner support. They created a Retrieval-Augmented Generation (RAG) pipeline that reads the partner’s question in plain text and retrieves the most relevant Standard Operating Procedure, then passes it to a Large Language Model for an answer. This eliminated the need for manual FAQ navigation. Their pilot release for some partners significantly improved self-serve rates. They plan to scale it to more vendors.

They recognized that shipping multiple generative AI projects required a stable back-end. They built a middle layer that connects internal Data Science workflows with external services. It centralizes governance, logging, and versioning. It also optimizes latency and ensures that no confidential data is accidentally exposed.

They learned that setting realistic stakeholder expectations was crucial because a prototype that impresses in a hackathon may still fail in production if it ignores data fidelity or latency constraints. They found that large external models often worked better for offline tasks, but for real-time tasks, customized or distilled models were a better fit. They devoted serious effort to controlling hallucinations by heavily curating training data and injecting real-time constraints and guardrails. They also learned that operationalizing generative AI at scale requires iterative improvement and patience.

They plan to continue focusing on catalog use cases that proved fruitful. They will improve the neural search pipeline for mainline usage. They will refine their support bot, exploring ways to increase adoption and coverage. They aim to expand generative video and text content generation to more brands on their platform, relying on the middle layer to handle governance, performance, and security.

Potential Follow-Up Question 1

How would you handle the hallucination issue when using a Large Language Model to generate restaurant or dish descriptions?

Answer and Explanation

Hallucination emerges when a model invents details not present in its training data or context prompt. The company’s approach combined curated domain data with strong prompt engineering. For domain-specific tasks, they restricted the model’s reference space by passing relevant factual content through retrieval. They used a configuration module that associated each dish with its known taxonomy, brand context, and consistent examples of acceptable descriptions. They also instituted a human oversight step for new or critical items. They set up constraints in the system prompt, reminding the model to stick to known facts and not produce unverified content. They used an acceptance threshold where any suspicious output was flagged for further human review or re-generation.

Potential Follow-Up Question 2

What strategies did they use for optimizing inference time, especially for real-time tasks like neural search?

Answer and Explanation

They tested smaller or fine-tuned variants of large models. They replaced a generic large language model with a specialized in-house model that better handled the limited domain of dish queries. This approach allowed them to reduce model size and complexity. They stored dish embeddings in a high-performance vector database for quick lookups. They streamlined data pipelines so that query embedding generation and nearest-neighbor searches occurred within sub-second times. They also performed thorough model instrumentation to track query time and to rapidly spot bottlenecks in the embedding layers, I/O routines, or network calls. They used batch processing where possible and introduced caching strategies for repeated queries.

Potential Follow-Up Question 3

Why did they use image blending and LoRA fine-tuning for Indian dishes instead of standard text-to-image generation?

Answer and Explanation

They found that general text-to-image models were prone to generating visually inconsistent Indian dishes. They tried to prompt them with specific dish keywords, but often the output was either too generic or incorrectly styled. They discovered that image blending lets them isolate the dish from a reliable source image and then place it onto a relevant background, maintaining a cohesive style. They also improved generation fidelity by fine-tuning Stable Diffusion with Low-Rank Adaptation on relevant Indian dish datasets. This training step made the model more sensitive to shape, texture, and plating details of the target dish categories. That reduced the guesswork in text prompting and created uniform, high-quality outputs at scale.

Potential Follow-Up Question 4

How did they address data security and privacy concerns when using external Large Language Model providers?

Answer and Explanation

They masked or anonymized personal data before forwarding anything to the Large Language Model Application Programming Interfaces. They had an internal governance team define strict policies about which fields could leave their servers. They used a middle layer that sanitized input queries, removed sensitive user details, and monitored calls for compliance. They negotiated a data usage agreement that prohibited the external provider from training on or storing their data. They enforced real-time logging of requests, with each request scanned for potential data leaks, and they implemented clear usage constraints that restricted sensitive topics or user-specific data from being passed to the external service.

Potential Follow-Up Question 5

What if the neural search pipeline fails to return accurate results for multi-intent queries like "spicy paneer pizza near me with extra cheese"?

Answer and Explanation

They engineered an approach that separated the query into sub-intents. One sub-intent targets "spicy paneer pizza," another sub-intent captures "near me," and the last handles "extra cheese." They used advanced tokenization that preserves context for each sub-intent. They generated embeddings for each sub-intent, then combined them through a weighted similarity measure or a hierarchical retrieval approach that first finds relevant categories (pizza) and then filters by additional properties (paneer, spice level, location proximity, extra cheese). They also established a fallback scenario that displays the best approximate match if the system cannot find exact sub-intent coverage. They kept track of user engagement and adjusted weighting parameters as they gathered more training data.

Potential Follow-Up Question 6

How would you measure success or return on investment for generative AI projects focusing on catalog content creation?

Answer and Explanation

They measured changes in user behavior that included higher click-through rates on newly generated images, shorter user journey times due to better descriptions, and improved funnel metrics. They measured operational cost savings, such as reduced manual effort to create content. They tracked the rate of cancellations and user dissatisfaction, aiming for decreases that suggested users were more informed about their orders. They also included intangible brand-lift metrics, such as consistent visual storytelling that attracted more restaurant partners who appreciated the improved presentation.

Potential Follow-Up Question 7

Would you keep using a third-party Large Language Model or build an in-house one?

Answer and Explanation

They started with external providers because it let them move fast. They found that offline tasks, like generating item descriptions, benefited from these powerful general-purpose models. Real-time or domain-specific tasks faced latency and cost constraints. They explored building smaller in-house models adapted to their data. This gave them more control over the inference infrastructure and data usage. They planned a hybrid strategy: third-party Large Language Models for offline tasks demanding very high quality and in-house custom models for real-time tasks requiring sub-second responses or specialized domain knowledge. This approach balanced cost, control, and quality without sacrificing user experience.

Rohan's Bytes

Discussion about this post