Retrieval Preference Optimization (RPO) enhances LLM robustness to multi-source knowledge by integrating retrieval evaluation within the generation process via reinforcement learning.
It uses retrieval relevance to guide knowledge selection, improving accuracy without extra processing.
-----
Paper - https://arxiv.org/abs/2501.13726
Original Problem: 🤔:
→ LLMs struggle to evaluate retrieved information accuracy, leading to knowledge conflicts during generation.
→ Existing methods to evaluate retrieval quality add computational overhead or limit information flow.
-----
Solution in this Paper: 💡:
→ RPO introduces an implicit retrieval relevance representation into the reward model.
→ This integrates retrieval evaluation and generation into a single model.
→ RPO simulates knowledge conflict by generating answers with and without retrieval, then filters contradictory instances.
→ Retrieval relevance is incorporated into the reward model to adaptively reward the best answer based on retrieval quality.
-----
Key Insights from this Paper: ✍️:
→ Aligning retrieval evaluation with generation improves LLM robustness to multi-source knowledge.
→ Implicitly representing retrieval relevance streamlines the process and reduces overhead.
→ Simulating knowledge conflict enhances the model's ability to resolve discrepancies.
-----
Results: 📊:
→ RPO outperforms RAG by 4-10% in accuracy on PopQA, Natural Questions, TriviaQA, and RGB datasets, without extra components.
→ Outperforms other adaptive RAG methods across benchmarks.
→ Improves knowledge selection, especially in instances with conflicting information.
Share this post