DisRanker shrinks LLM's web ranking power into BERT's efficiency, making web search smarter without the LLM computational burden.
Basically, teaching BERT to rank like an LLM, but 10x faster
https://arxiv.org/abs/2411.04539
Original Problem 🤔:
LLMs show great potential as zero-shot relevance rankers for web search, but their high computational costs make direct implementation impractical for real-world search engines.
-----
Solution in this Paper 🛠️:
→ DisRanker transfers LLM's ranking expertise to smaller BERT models through a three-stage process
→ Stage 1: Domain-specific Continued Pre-Training using clickstream data, where queries generate clicked titles and summaries
→ Stage 2: Supervised Fine-Tuning of LLM using rank loss, with end-of-sequence token representing query-document pairs
→ Stage 3: Knowledge Distillation using hybrid Point-MSE and Margin-MSE loss to transfer knowledge to BERT
-----
Key Insights 💡:
→ LLMs can effectively learn ranking through domain-specific pre-training
→ End-of-sequence token better represents query-document relationships than traditional [CLS] token
→ Hybrid loss function prevents overfitting while maintaining ranking order
→ Student model achieves similar performance with 70x less parameters
-----
Results 📊:
→ Improved PNR (Positive-Negative Ratio) from 3.514 to 3.643
→ Increased NDCG@5 from 0.8709 to 0.8793
→ Online A/B tests showed +0.47% PageCTR, +0.58% UserCTR, and +1.2% dwell time improvements
→ Latency reduced from ~100ms (LLM) to ~10ms (BERT-6)
Share this post