0:00
/
0:00
Transcript

"EXIT: Context-Aware Extractive Compression for Enhancing Retrieval-Augmented Generation"

Generated below podcast on this paper with Google's Illuminate.

Intelligent sentence filtering: the key to faster, more accurate RAG.

EXIT improves RAG systems by intelligently compressing retrieved documents while preserving essential information, making question-answering faster and more accurate.

-----

https://arxiv.org/abs/2412.12559

🤔 Original Problem:

RAG systems struggle when retrievers fail to rank relevant documents well. Adding more documents hurts both speed and accuracy due to LLMs' difficulty with long contexts and distracting information.

-----

🔧 Solution in this Paper:

→ EXIT splits retrieved documents into sentences and evaluates each one's relevance to the query

→ It uses parallel binary classification to determine if sentences contain answer-critical information

→ The system considers full document context when scoring each sentence, not just the sentence itself

→ Selected sentences are recombined in their original order to maintain coherence

→ The compression adapts dynamically based on query complexity and retrieval quality

-----

💡 Key Insights:

→ Context-aware sentence selection outperforms both abstractive and extractive baselines

→ Parallel processing enables fast compression without sacrificing accuracy

→ Preserving sentence order maintains document coherence

→ The framework works as a plug-and-play module for any RAG pipeline

-----

📊 Results:

→ Reduces processing time from several seconds to ~1 second

→ Achieves 86.8% token reduction while improving answer quality

→ Improves EM scores by 3.7 points using 70B LLM

→ Maintains high accuracy across both single-hop and multi-hop QA tasks

Discussion about this video