This paper shows how LLMs can retrieve knowledge directly while generating text, no separate retriever needed.
RetroLLM unifies retrieval and generation into a single process, enabling LLMs to directly generate evidence from knowledge sources while reducing deployment costs and token usage.
-----
https://arxiv.org/abs/2412.11919
🤔 Original Problem:
Traditional Retrieval-Augmented Generation (RAG) systems require separate retrievers, consume excessive tokens, and lack joint optimization between retrieval and generation components.
-----
🔧 Solution in this Paper:
→ RetroLLM integrates retrieval directly into the generation process through hierarchical FM-Index constraints
→ The system first generates corpus-constrained clues to identify relevant document subsets
→ It employs forward-looking constrained decoding that considers future sequence relevance when generating evidence
→ The model autonomously decides how much evidence to retrieve and when to generate the final response
-----
💡 Key Insights:
→ Unified frameworks eliminate the need for separate retrievers and enable true joint optimization
→ Hierarchical constraints help reduce false pruning during evidence generation
→ Forward-looking strategies improve evidence accuracy by considering future relevance
-----
📊 Results:
→ Uses 2.1x fewer tokens than traditional RAG methods
→ Achieves superior R@1 scores compared to dense retrievers
→ Outperforms existing methods across both in-domain and out-of-domain tasks on 5 QA datasets
→ Uses only 3.29 passages on average versus 5 for baselines
Share this post