This Paper identifies key drivers of character hallucination in LLMs and proposes Narrator Mode as a novel solution.
📚 https://arxiv.org/pdf/2409.16727
Original Problem 🔍:
LLM-based role-playing systems suffer from character hallucination, where models generate responses inconsistent with predefined character roles.
-----
Solution in this Paper 💡:
• Introduces RoleBreak framework identifying query sparsity and role-query conflict as key drivers of hallucination
• Constructs RoleBreakEval dataset to evaluate hallucination mitigation techniques
• Proposes Narrator Mode defense strategy generating supplemental narrative context
-----
Key Insights from this Paper 💡:
• Character hallucination viewed as "jailbreak" attack on role-playing systems
• Even enhanced models remain vulnerable to RoleBreak attacks
• Rejection-based strategies have limited generalization for handling hallucinations
• Narrator Mode outperforms traditional approaches in reducing hallucinations and improving coherence
-----
Results 📊:
• Narrator Mode reduces hallucination rate to 0.36 (vs 0.48 for GPT-3.5)
• Improves role fidelity to 0.71 (vs 0.65 for GPT-3.5)
• Enhances query fidelity to 0.89 (vs 0.83 for GPT-3.5)
• Increases story coherence score to 4.21 (vs 4.13 for GPT-3.5) character consistency.
Share this post