0:00
/
0:00
Transcript

"Bench-CoE: a Framework for Collaboration of Experts from Benchmark"

The podcast on this paper is generated with Google's Illuminate.

Bench-CoE introduces a framework that combines multiple expert LLMs using benchmark evaluations, enabling efficient task routing without extensive training or complex labeling.

-----

https://arxiv.org/abs/2412.04167

Original Problem 🤔:

LLMs show diverse performance across different tasks, but effectively combining their strengths remains challenging. Current methods require extensive labeled data and complex training.

-----

Solution in this Paper 🛠️:

→ Bench-CoE framework enables collaboration between expert models through a router trained on benchmark evaluations

→ Implements two routing approaches: Query-Level for fine-grained task assignment and Subject-Level for better generalization

→ Query-Level router assigns tasks based on individual query performance

→ Subject-Level router uses coarse-grained subject-level labels from benchmark evaluations

→ Framework includes expert models, router, and benchmark dataset for training

-----

Key Insights 🔍:

→ Subject-Level routing shows stronger generalization on out-of-distribution data

→ Query-Level routing performs better on in-distribution data but risks overfitting

→ Framework achieves superior performance without extensive training

-----

Results 📊:

→ Subject-Level Bench-CoE achieves 51.78% accuracy on MMMU dataset, +4.11% improvement

→ Query-Level approach shows 64.28% accuracy on MMLU-Pro, +12.24% improvement

→ Model demonstrates 0.64 winning rate on test predictions

Discussion about this video