0:00
/
0:00
Transcript

"Semantic Retrieval at Walmart"

The podcast on this paper is generated with Google's Illuminate.

Walmart's hybrid search combines neural and traditional retrieval to crack the tail query challenge.

Walmart developed a hybrid search system combining traditional inverted index with neural retrieval to handle millions of daily product searches, particularly improving tail query performance through efficient embedding-based semantic matching and practical deployment optimizations.

-----

https://arxiv.org/abs/2412.04637

🔍 Original Problem:

→ E-commerce product search faces unique challenges compared to web search, especially for tail queries with specific intent

→ Traditional text matching methods struggle with vocabulary mismatches and synonyms

→ Pure neural retrieval systems are limited by embedding size constraints and latency requirements

-----

🛠️ Solution in this Paper:

→ The system uses a two-tower BERT architecture to generate embeddings for queries and products

→ A novel negative sampling strategy combines product category matching and token matching to improve model training

→ Linear projection reduces embedding dimension from 768 to 256 while maintaining performance

→ The architecture merges results from both inverted index and neural retrieval before final ranking

-----

💡 Key Insights:

→ Product titles provide most signal for retrieval compared to descriptions

→ Freezing token embeddings during training improves model generalization

→ Hard negative sampling significantly boosts category recall by 20.47%

-----

📊 Results:

→ NDCG@10 improved by 2.84% for tail queries

→ Add-to-cart rate increased by 0.54%

→ Maintained low latency of 13ms for ANN service

Discussion about this video

User's avatar