0:00
/
0:00
Transcript

"RARe: Retrieval Augmented Retrieval with In-Context Examples"

The podcast on this paper is generated with Google's Illuminate.

RARe (Retrieval Augmented Retrieval with In-Context Examples), proposed in this paper, teaches retrieval models to learn from examples, just like LLMs do.

📚 https://arxiv.org/abs/2410.20088

Original Problem 🤔:

In-context learning works great for LLMs but hasn't been explored for retrieval models. Simply adding examples to queries doesn't work well for retrievers, creating a gap in leveraging example-based learning for information retrieval tasks.

-----

Solution in this Paper 🛠️:

→ Introduces RARe (Retrieval Augmented Retrieval with In-Context Examples) that finetunes pre-trained models using semantically similar query-document pairs as examples

→ Uses BM25 to find relevant example pairs during training and inference

→ Augments queries with task instructions and retrieved examples in a specific format

→ Works with both decoder-only LLMs and existing retriever models

-----

Key Insights 💡:

→ Retrieved (semantically similar) examples work better than random examples

→ Performance improves with more examples (up to 10 tested)

→ Including negative examples in prompts doesn't improve performance

→ Latency overhead diminishes for larger corpus sizes

-----

Results 📊:

→ Achieves +2.72% nDCG gains on reasoning-oriented retrieval tasks

→ Shows +1.41% nDCG@10 improvement on standard retrieval benchmarks

→ Demonstrates stronger out-of-domain generalization compared to baseline models

→ Maintains consistent improvements across different model architectures

Discussion about this video