ML Case-study Interview Question: Fixing E-commerce Search Queries with Language Model Expansion & Rectification

Rohan Paul

Apr 22, 2025

Browse all the ML Case-Studies here.

Case-Study question

You are given a live-shopping e-commerce platform. Users submit many misspelled or abbreviated queries (for example, “jewlery” instead of “jewelry” or “lv” for “louis vuitton”), which leads to low recall. Management wants a robust query expansion solution to improve search relevance, reduce user confusion, and drive higher conversions. Propose a strategy to handle these malformed or incomplete queries at scale. Then detail how you would measure success and handle edge cases where expansions might introduce noise or incorrect matches.

Connect with me on X (Twitter)

Detailed Solution

Problem Overview

Users frequently enter queries containing misspellings or acronyms. Failing to match these queries to relevant items leads to missed revenue opportunities. A language model-based rectification system is beneficial for correcting these errors and expanding acronyms.

Data Logging and Analysis

Collect all user queries, associated filters, and subsequent actions in search sessions. Each query is stored with the final tab (products, shows, etc.) that the user visits. This helps identify which tokens align with high-engagement outcomes. Text normalization (to lowercase and removing punctuation) and tokenization (splitting by whitespace) transform queries into consistent tokens.

Generating Corrections and Expansions

Schedule a process to extract frequent tokens and feed them to a Generative Pre-trained Transformer. The model outputs potential corrections and expansions. Store results in a key-value system, mapping original tokens to expansions or rectifications with confidence scores.

Serving Expanded Queries

When a user submits a query, split it into tokens. Look up each token in the expansion cache. Combine user tokens with possible expansions based on confidence scores. Form an augmented query expression used to retrieve more relevant results. Show content matching both the original and expanded tokens.

Ongoing Improvements

Index-time expansions and n-gram rectifications ensure that longer phrases get the same expansions in reverse (e.g., “san diego comic con” matching “sdcc”). The strategy can extend to synonyms and brand-specific expansions. Another improvement is extracting attributes from product data using the language model to better map user queries to structured filters.

Measuring Success

Precision is the fraction of returned items that are actually relevant. Recall is the fraction of relevant items returned. F1 is the harmonic mean of precision and recall. Higher recall ensures fewer missed items, while high precision avoids irrelevant items. Tracking changes in user engagement metrics (sessions that lead to purchases or deeper interactions) also signals success.

Implementation Example

def expand_query_tokens(tokens, expansion_map):
    expanded_tokens = []
    for t in tokens:
        if t in expansion_map and expansion_map[t]["confidence"] > 0.7:
            expanded_tokens.append(expansion_map[t]["expansion"])
        else:
            expanded_tokens.append(t)
    return expanded_tokens

user_query = "jewlery"
tokens = user_query.lower().split()
exp_tokens = expand_query_tokens(tokens, expansion_map)
# Then pass exp_tokens to the search engine for retrieval

This Python function looks up each token in the expansion_map and uses expansions above a confidence threshold of 0.7.

How would you handle partial matches for multi-word phrases?

Training the language model to rectify or expand single tokens works, but partial matches within multi-word phrases require index-time expansions or offline pre-processing that produces multiple expansions. This captures tokens like “san diego comic con” which might also be typed as “sdcc.” Creating a standard mapping from full phrase to its acronym ensures bidirectional matching. Storing both forms at index time helps retrieve relevant items.

How do you keep latency low with large language models?

The model runs offline or on a regular schedule. Fresh expansions are stored in a fast key-value store. The query-time request is limited to a cache lookup plus standard search indexing. This design maintains near real-time response times while leveraging advanced language model knowledge.

How do you handle conflicts when expansions produce wrong matches?

Some tokens have multiple expansions. For instance, “ms” might expand to “microsoft,” “milliseconds,” or “mschs.” Store confidence scores and track user engagement signals to refine expansions over time. The system can automatically demote expansions with low click-through or negative signals (like quick user backtracks).

How would you evaluate relevance beyond F1?

User journey metrics matter, such as time on page, clicks to item detail pages, and checkout conversion. Queries with expansions producing meaningful engagement improvements signal success. Offline A/B tests compare expansion-based search against a baseline. A lift in conversions or session durations indicates that expansions are improving the user experience.

How do you handle brand-specific queries?

Enable brand-based expansions with specialized brand dictionaries. A language model remains a good general approach, but adding curated brand maps captures domain-specific expansions. The brand map can override expansions if a known brand name is detected.

How do you prevent misinterpretation of short tokens?

Short tokens, such as “lv,” can be ambiguous. Rely on user interaction data to confirm which expansions make sense. If “lv” expansions to “louis vuitton” drives more conversions than other expansions, keep it active. Gather feedback from subsequent queries and user actions to refine expansions dynamically.

How do you adapt to changes in trending queries?

Scheduled reprocessing of query logs ensures new tokens or acronyms are captured as trends emerge. A daily or weekly batch run is enough for most e-commerce sites. Frequent re-training or prompting updates are critical if user behavior shifts quickly, such as new fashion abbreviations during seasonal events.

How do you deal with user privacy?

Token-level analytics keep data usage abstract. The approach focuses on query text rather than personal information. Logs do not store sensitive details, and only aggregated usage metrics guide expansions. Compliance with data protection regulations remains essential by ensuring no private user data is exposed to the model.

Rohan's Bytes

Discussion about this post