Teaching LLMs to recognize their knowledge boundaries improves RAG efficiency by 50%.
This paper introduces a method to reduce unnecessary retrieval operations in LLMs by generating initial tokens and using an "I Know" (IK) score to determine when external knowledge is needed.
-----
https://arxiv.org/abs/2412.11536
🤔 Original Problem:
LLMs often perform unnecessary retrievals during RAG, increasing computational costs and sometimes degrading answer quality when retrieved information is poor.
-----
🔧 Solution in this Paper:
→ The system trains LLMs to predict if they can answer questions without external retrieval using an IK classifier
→ It generates 32 tokens of an initial answer to help determine confidence level
→ The IK score is computed by applying softmax only to Yes/No token logits
→ Training requires just 20,000 samples and takes one hour on a single A100 GPU
-----
💡 Key Insights:
→ Including 32 tokens from generated answers significantly improves classifier performance
→ The IK classifier achieves 80% accuracy in determining when retrieval is needed
→ System reduces retrieval operations by over 50% while maintaining effectiveness
-----
📊 Results:
→ Processing time without RAG: 18ms per query
→ Processing time with RAG (5 documents): 78ms per query
→ IK classifier adds only 3.7ms latency
→ Overall efficiency gain of 25% when focusing on generation aspect
Share this post