0:00
/
0:00
Transcript

"An Early FIRST Reproduction and Improvements to Single-Token Decoding for Fast Listwise Reranking"

The podcast on this paper is generated with Google's Illuminate.

Skip the whole essay, just read the first word - works for LLMs too!

Single-token decoding makes LLM reranking 42% faster without losing accuracy.

→ FIRST (Faster Improved Listwise Reranking with Single Token Decoding) speeds up document reranking by using just the first token's logits instead of full sequences

https://arxiv.org/abs/2411.05508

🎯 Original Problem:

LLMs excel at reranking documents but face high computational costs due to generating complete permutations. Traditional language modeling objectives also fail to prioritize ranking accuracy for top candidates.

-----

🔧 Solution in this Paper:

→ FIRST (Faster Improved Listwise Reranking with Single Token Decoding) uses only first-token logits for determining rank order, eliminating full sequence generation

→ Incorporates weighted pairwise learning-to-rank loss that prioritizes top candidate accuracy

→ Combines ranking loss with language modeling loss using joint optimization

→ Uses sliding window approach with size 20 and step size 10 for processing documents

-----

💡 Key Insights:

→ Language Model training implicitly improves zero-shot single-token reranking

→ LLM pre-training may paradoxically hinder subsequent FIRST fine-tuning

→ FirstMistral achieved highest effectiveness among tested models

→ Stronger initial retrieval leads to better post-reranking with diminishing returns

-----

📊 Results:

→ Achieves 21%-42% latency improvements across models and benchmarks

→ FirstMistral achieved highest nDCG@10 scores on 8/11 benchmark datasets

→ Maintains effectiveness comparable to full-sequence generation approaches

→ Shows consistent performance across diverse domains and retrievers

Discussion about this video