Making LLMs collaborate like expert teammates - one generates answers, one checks them, and one refines them.
MALT introduces a novel approach to train multiple LLMs collaboratively, improving reasoning capabilities through specialized roles and joint optimization. The system employs three agents - generator, verifier, and refinement model - working sequentially to solve complex problems.
-----
https://arxiv.org/abs/2412.01928
🤔 Original Problem:
Current LLM systems typically operate as single models, lacking the collaborative capabilities that could enhance their problem-solving abilities. Despite promising results in multi-agent settings, there's limited progress in training models to work together effectively.
-----
🔧 Solution in this Paper:
→ MALT implements a sequential multi-agent setup with three specialized LLMs: a generator creates initial solutions, a verifier critiques them, and a refinement model improves the final output.
→ The system uses trajectory-expansion-based synthetic data generation, creating diverse reasoning paths through exponential branching.
→ A credit assignment strategy driven by joint outcome-based rewards enables autonomous improvement of each model's specialized capabilities.
-----
💡 Key Insights:
→ Multi-agent LLM training can significantly improve performance over single-model approaches
→ Specialized roles in LLM teams lead to better problem-solving capabilities
→ Joint training with synthetic data generation enhances collaborative performance
-----
📊 Results:
→ Using Llama 3.1 8B models, MALT achieved relative improvements of 14.14% on MATH dataset
→ 7.12% improvement on GSM8k benchmark
→ 9.40% enhancement on CSQA tasks
Share this post