Retrieval-augmented LLMs hallucinate in long-form question answering (LFQA).
This work introduces Retrieval Heads-Induced Optimization (RHIO) to improve contextual faithfulness. RHIO uses retrieval heads to generate realistic unfaithful training examples and teaches LLMs to distinguish between faithful and unfaithful generations.
-----
Paper - https://arxiv.org/abs/2501.13573
Original Problem 🙁:
→ Retrieval-augmented large language models (LLMs) often generate unfaithful responses in LFQA, eroding user trust.
-----
Solution in this Paper 💡:
→ RHIO augments unfaithful training data by masking retrieval heads, which are attention heads key for retrieving information from context.
→ RHIO utilizes special control tokens ([POS], [NEG]) to fine-tune LLMs, teaching them to discriminate between faithful and unfaithful responses.
→ RHIO uses contrastive decoding to amplify the difference between outputs induced by these control tokens, further enhancing faithfulness.
-----
Key Insights from this Paper 🤔:
→ Retrieval heads in LLMs are crucial for maintaining contextual faithfulness in LFQA.
→ Masking retrieval heads produces realistic unfaithful examples, mimicking model-intrinsic errors.
→ Explicitly teaching LLMs to distinguish faithful and unfaithful generations enhances contextual faithfulness.
Results ✅:
→ RHIO improves faithfulness on GroundBench, with gains of 12.84% and 12.59% in 7B and 13B models, respectively.
→ RHIO even outperforms GPT-40.
Share this post