0:00
/
0:00
Transcript

"MALT: Improving Reasoning with Multi-Agent LLM Training"

The podcast on this paper is generated with Google's Illuminate.

Making LLMs collaborate like expert teammates - one generates answers, one checks them, and one refines them.

MALT introduces a novel approach to train multiple LLMs collaboratively, improving reasoning capabilities through specialized roles and joint optimization. The system employs three agents - generator, verifier, and refinement model - working sequentially to solve complex problems.

-----

https://arxiv.org/abs/2412.01928

🤔 Original Problem:

Current LLM systems typically operate as single models, lacking the collaborative capabilities that could enhance their problem-solving abilities. Despite promising results in multi-agent settings, there's limited progress in training models to work together effectively.

-----

🔧 Solution in this Paper:

→ MALT implements a sequential multi-agent setup with three specialized LLMs: a generator creates initial solutions, a verifier critiques them, and a refinement model improves the final output.

→ The system uses trajectory-expansion-based synthetic data generation, creating diverse reasoning paths through exponential branching.

→ A credit assignment strategy driven by joint outcome-based rewards enables autonomous improvement of each model's specialized capabilities.

-----

💡 Key Insights:

→ Multi-agent LLM training can significantly improve performance over single-model approaches

→ Specialized roles in LLM teams lead to better problem-solving capabilities

→ Joint training with synthetic data generation enhances collaborative performance

-----

📊 Results:

→ Using Llama 3.1 8B models, MALT achieved relative improvements of 14.14% on MATH dataset

→ 7.12% improvement on GSM8k benchmark

→ 9.40% enhancement on CSQA tasks

Discussion about this video