Making LLMs admit when they're unsure by asking tricky follow-up questions.
DIVERSEAGENTENTROPY quantifies LLM uncertainty by evaluating consistency across diverse questions rather than self-consistency on a single query, enabling more accurate hallucination detection.
https://arxiv.org/abs/2412.09572
🤔 Original Problem:
Existing methods for quantifying LLM uncertainty rely on self-consistency for a single query, which can be misleading as models may consistently provide incorrect answers.
-----
🔍 Solution in this Paper:
→ DIVERSEAGENTENTROPY introduces a multi-agent interaction approach to estimate LLM uncertainty.
→ It generates diverse questions related to the original query, creating multiple agents from the same base model.
→ Agents engage in controlled one-on-one interactions to refine their answers.
→ The method calculates weighted entropy based on agents' final answers and their consistency during interactions.
→ An abstention policy is implemented to withhold responses when uncertainty is high.
-----
💡 Key Insights from this Paper:
→ Consistency across diverse questions is a better indicator of model certainty than self-consistency on a single query
→ Multi-agent interaction allows models to self-correct and improve answer accuracy
→ Models often fail to consistently retrieve correct answers across different contexts, even when they possess the knowledge
-----
📊 Results:
→ Outperforms existing self-consistency methods with higher AUROC scores
→ Achieves 2.5% improvement in accuracy on known questions
→ Shows better calibration compared to other uncertainty estimation approaches
Share this post