A modular system that routes questions to the right expert LLM, just like a hospital routes patients to specialists.
The Composition of Experts (CoE) system introduces a modular approach to combine multiple specialized LLMs, using a smart router to direct queries to the most appropriate expert model, achieving better performance while using fewer parameters.
-----
https://arxiv.org/abs/2412.01868
🤖 Original Problem:
Current monolithic LLMs are expensive to serve, hard to maintain, and struggle with specialized tasks. Fine-tuning them for specific domains is costly and risks degrading performance in other areas.
-----
🔧 Solution in this Paper:
→ CoE implements a two-step routing approach that first classifies input into distinct categories using a category router.
→ A category-to-expert mapping then directs the input to the most suitable expert LLM for that category.
→ The system uses uncertainty quantification to handle inputs outside training distribution by routing them to a general category.
→ Implementation leverages SambaNova SN40L RDU's three-tiered memory architecture for efficient model switching.
-----
💡 Key Insights:
→ Modular expert systems are more cost-effective than monolithic LLMs
→ Two-step routing provides better accuracy than direct expert routing
→ Natural clustering of prompts by categories enables effective routing
-----
📊 Results:
→ Achieves 59.4 score on Arena-Hard using only 31 billion average active parameters
→ Scores 9.06 on MT-Bench with 54 billion average active parameters
→ Successfully combines Qwen, Gemma, and Llama models as experts
Share this post