"Eliciting In-context Retrieval and Reasoning for Long-context Large Language Models"

Playback speed

Share post at current time

0:00

Transcript

"Eliciting In-context Retrieval and Reasoning for Long-context Large Language Models"

Below podcast is generated with Google's Illuminate.

Rohan Paul

Jan 27, 2025

Small models can outperform giants by learning to ignore irrelevant noise

The paper introduces ICR2, a new benchmark that tests LLMs' ability to retrieve and reason with long contexts while handling confounding information, making evaluation more realistic.

-----

https://arxiv.org/abs/2501.08248

Original Problem 🔍:

→ Current benchmarks like LOFT overestimate LLMs' performance by using oversimplified contexts without challenging confounding information

→ LLMs struggle with accurate retrieval and reasoning when faced with realistic scenarios containing misleading but relevant passages

-----

Solution in this Paper 🛠️:

→ ICR2 benchmark uses strong retrievers to select challenging confounding passages, creating more realistic test conditions

→ Introduces retrieve-then-generate fine-tuning where models first find relevant passages then generate answers

→ Employs retrieval-attention-probing to filter noisy contexts using attention heads during decoding

→ Implements joint training of dedicated retrieval and generation heads

-----

Key Insights 💡:

→ LLMs are highly sensitive to confounding information in context

→ Explicit retrieval steps improve performance compared to end-to-end approaches

→ Attention heads can effectively identify relevant passages

-----

Results 📊:

→ Best approach (Mistral-7B): +17 points on LOFT, +13 points on ICR2 vs vanilla RAG

→ Outperforms GPT-4-Turbo despite being much smaller

→ Achieves 51% improvement in exact match rates compared to baseline

Rohan's Bytes

"Eliciting In-context Retrieval and Reasoning for Long-context Large Language Models"

Discussion about this video