Skip the whole essay, just read the first word - works for LLMs too!
Single-token decoding makes LLM reranking 42% faster without losing accuracy.
→ FIRST (Faster Improved Listwise Reranking with Single Token Decoding) speeds up document reranking by using just the first token's logits instead of full sequences
https://arxiv.org/abs/2411.05508
🎯 Original Problem:
LLMs excel at reranking documents but face high computational costs due to generating complete permutations. Traditional language modeling objectives also fail to prioritize ranking accuracy for top candidates.
-----
🔧 Solution in this Paper:
→ FIRST (Faster Improved Listwise Reranking with Single Token Decoding) uses only first-token logits for determining rank order, eliminating full sequence generation
→ Incorporates weighted pairwise learning-to-rank loss that prioritizes top candidate accuracy
→ Combines ranking loss with language modeling loss using joint optimization
→ Uses sliding window approach with size 20 and step size 10 for processing documents
-----
💡 Key Insights:
→ Language Model training implicitly improves zero-shot single-token reranking
→ LLM pre-training may paradoxically hinder subsequent FIRST fine-tuning
→ FirstMistral achieved highest effectiveness among tested models
→ Stronger initial retrieval leads to better post-reranking with diminishing returns
-----
📊 Results:
→ Achieves 21%-42% latency improvements across models and benchmarks
→ FirstMistral achieved highest nDCG@10 scores on 8/11 benchmark datasets
→ Maintains effectiveness comparable to full-sequence generation approaches
→ Shows consistent performance across diverse domains and retrievers
Share this post