Bench-CoE introduces a framework that combines multiple expert LLMs using benchmark evaluations, enabling efficient task routing without extensive training or complex labeling.
-----
https://arxiv.org/abs/2412.04167
Original Problem 🤔:
LLMs show diverse performance across different tasks, but effectively combining their strengths remains challenging. Current methods require extensive labeled data and complex training.
-----
Solution in this Paper 🛠️:
→ Bench-CoE framework enables collaboration between expert models through a router trained on benchmark evaluations
→ Implements two routing approaches: Query-Level for fine-grained task assignment and Subject-Level for better generalization
→ Query-Level router assigns tasks based on individual query performance
→ Subject-Level router uses coarse-grained subject-level labels from benchmark evaluations
→ Framework includes expert models, router, and benchmark dataset for training
-----
Key Insights 🔍:
→ Subject-Level routing shows stronger generalization on out-of-distribution data
→ Query-Level routing performs better on in-distribution data but risks overfitting
→ Framework achieves superior performance without extensive training
-----
Results 📊:
→ Subject-Level Bench-CoE achieves 51.78% accuracy on MMMU dataset, +4.11% improvement
→ Query-Level approach shows 64.28% accuracy on MMLU-Pro, +12.24% improvement
→ Model demonstrates 0.64 winning rate on test predictions
Share this post