StructuredRAG shows how LLMs can generate precise structured outputs like JSON with 82% accuracy, without fine-tuning.
Average success rate across 24 experiments: 82.55%
https://arxiv.org/abs/2408.11061
Solution in this Paper 🛠️:
• Introduces StructuredRAG: 6 tasks to assess LLMs' proficiency in following response format instructions
• Tests include string, integer, boolean, list of strings, and composite object outputs
• Compares two prompting strategies: f-String and Follow the Format (FF)
• Evaluates Gemini 1.5 Pro and Llama 3 8B-instruct (4-bit quantized)
• Applies OPRO prompt optimization to improve performance
Key Insights 💡:
• Task complexity significantly influences performance
• High variance in success rates across models, tasks, and prompting strategies
• Llama 3 8B-instruct often performs competitively with Gemini 1.5 Pro
• OPRO prompt optimization can achieve 100% success rate on complex tasks
Results 📊:
• High variance: 11/24 tests achieve 100% success, 2/24 achieve ≤25% success
• Gemini 1.5 Pro outperforms Llama 3 8B-instruct: 93.4% vs 71.7% average success rate
• Complex outputs (lists, composite objects) have lower success rates: 72.1% for ParaphraseQuestions, 67.6% for GenerateAnswersWithConfidences
• OPRO optimization achieves 100% success on GenerateAnswersWithConfidences task with Llama 3 8B-instruct
------
Are you into AI and LLMs❓ Join me on Twitter with 42.2K others, to remain on the bleeding-edge every day.
𝕏/🐦 https://x.com/rohanpaul_ai
Share this post