Using first token confidence turns RAG into a more reliable question answerer.
A novel first token probability guided RAG framework that optimizes hyperparameters using confidence scores to improve telecom MCQA tasks by reducing hallucinations and enhancing retrieval quality.
https://arxiv.org/abs/2501.06468
Original Problem 🤔:
→ Traditional RAG systems struggle with MCQA tasks due to poor retrieval quality and hallucinations.
→ Existing methods fail to effectively match correct options when using smaller models.
→ Current approaches lack clear confidence metrics in their decision-making process.
Solution in this Paper 💡:
→ The framework starts by retrieving relevant chunks from telecom documents.
→ It generates a single token as potential answer instead of full text responses.
→ The probabilities of all options are normalized to create confidence scores.
→ These scores guide dynamic context adjustments and hyperparameter optimization.
→ The system iteratively optimizes chunk numbers and window sizes based on confidence levels.
Key Insights 🔍:
→ First token probabilities strongly correlate with prediction accuracy
→ Higher confidence scores indicate better answer reliability
→ Combining multiple embedding models improves overall performance
→ Dynamic context adjustment reduces hallucination risks
Results 📊:
→ Achieved 78.4% accuracy on telecom MCQA tasks
→ 26.8% improvement over baseline without RAG
→ Successfully answered 250+ questions with 80%+ accuracy
→ Combined embedding models showed significant performance boost
Share this post