Toward General Instruction-Following Alignment for Retrieval-Augmented Generation
Using automated verification and atomic instructions, VIF-RAG enables LLMs to follow complex rules during knowledge retrieval.
Using automated verification and atomic instructions, VIF-RAG enables LLMs to follow complex rules during knowledge retrieval.
Combines atomic instructions with verification code to ensure LLMs follow rules while accessing external knowledge
Original Problem 🔍:
Existing instruction-following (IF) alignment methods for LLMs lack effectiveness in Retrieval-Augmented Generation (RAG) scenarios due to diverse knowledge introduced by retrieval.
Solution in this Paper 🛠️:
• VIF-RAG: Automated, scalable, verifiable synthetic pipeline for IF alignment in RAG
• Starts with <100 atomic instructions, uses combination rules for complex instructions
• Employs supervised models for instruction rewriting and code generation for verification
• Integrates instructions with RAG and general data, scaling to >100K high-quality samples
• Introduces FollowRAG Benchmark: 3K test samples, 22 constraint types, 4 QA benchmarks
Key Insights from this Paper 💡:
• First framework addressing IF alignment in RAG scenarios
• Automated verification at each step ensures high-quality data synthesis
• Seamless integration with various RAG benchmarks for comprehensive evaluation
• Balances IF alignment with preservation of LLM's foundational abilities
Results 📊:
• VIF-RAG outperforms all baselines in FollowRAG across multiple configurations
• >10% improvement on average accuracy compared to baselines
• Maintains performance stability with increasing number of instructions (up to 4)
• Effectively preserves other foundational capabilities of LLMs
🧠 How does VIF-RAG generate high-quality instruction data?
VIF-RAG starts by manually crafting a minimal set of atomic instructions (<100) and developing combination rules to synthesize and verify complex instructions.
It then uses supervised models for instruction rewriting while simultaneously generating code to automate verification via a Python executor. Finally, it integrates these instructions with extensive RAG and general data samples, scaling up to over 100K high-quality samples through automated processes.