Walmart's hybrid search combines neural and traditional retrieval to crack the tail query challenge.
Walmart developed a hybrid search system combining traditional inverted index with neural retrieval to handle millions of daily product searches, particularly improving tail query performance through efficient embedding-based semantic matching and practical deployment optimizations.
-----
https://arxiv.org/abs/2412.04637
🔍 Original Problem:
→ E-commerce product search faces unique challenges compared to web search, especially for tail queries with specific intent
→ Traditional text matching methods struggle with vocabulary mismatches and synonyms
→ Pure neural retrieval systems are limited by embedding size constraints and latency requirements
-----
🛠️ Solution in this Paper:
→ The system uses a two-tower BERT architecture to generate embeddings for queries and products
→ A novel negative sampling strategy combines product category matching and token matching to improve model training
→ Linear projection reduces embedding dimension from 768 to 256 while maintaining performance
→ The architecture merges results from both inverted index and neural retrieval before final ranking
-----
💡 Key Insights:
→ Product titles provide most signal for retrieval compared to descriptions
→ Freezing token embeddings during training improves model generalization
→ Hard negative sampling significantly boosts category recall by 20.47%
-----
📊 Results:
→ NDCG@10 improved by 2.84% for tail queries
→ Add-to-cart rate increased by 0.54%
→ Maintained low latency of 13ms for ANN service
Share this post