0:00
/
0:00
Transcript

"Composition of Experts: A Modular Compound AI System Leveraging Large Language Models"

The podcast on this paper is generated with Google's Illuminate.

A modular system that routes questions to the right expert LLM, just like a hospital routes patients to specialists.

The Composition of Experts (CoE) system introduces a modular approach to combine multiple specialized LLMs, using a smart router to direct queries to the most appropriate expert model, achieving better performance while using fewer parameters.

-----

https://arxiv.org/abs/2412.01868

🤖 Original Problem:

Current monolithic LLMs are expensive to serve, hard to maintain, and struggle with specialized tasks. Fine-tuning them for specific domains is costly and risks degrading performance in other areas.

-----

🔧 Solution in this Paper:

→ CoE implements a two-step routing approach that first classifies input into distinct categories using a category router.

→ A category-to-expert mapping then directs the input to the most suitable expert LLM for that category.

→ The system uses uncertainty quantification to handle inputs outside training distribution by routing them to a general category.

→ Implementation leverages SambaNova SN40L RDU's three-tiered memory architecture for efficient model switching.

-----

💡 Key Insights:

→ Modular expert systems are more cost-effective than monolithic LLMs

→ Two-step routing provides better accuracy than direct expert routing

→ Natural clustering of prompts by categories enables effective routing

-----

📊 Results:

→ Achieves 59.4 score on Arena-Hard using only 31 billion average active parameters

→ Scores 9.06 on MT-Bench with 54 billion average active parameters

→ Successfully combines Qwen, Gemma, and Llama models as experts

Discussion about this video