"Rethinking Mixture-of-Agents: Is Mixing Different LLMs Beneficial?"
Below podcast on this paper is generated with Google's Illuminate.
→ Mixture-of-Agents systems use multiple different LLMs to improve output.
→ However, balancing the quality and diversity of these LLMs is hard. Including weaker models can reduce overall performance.
This paper proposes Self-MoA which aggregates outputs from a single, high-performing LLM, leveraging inherent randomness. It outperforms methods that rely on diverse models.
-----
https://www.arxiv.org/abs/2502.00674
📌 Self-MoA cleverly uses stochastic decoding. It generates diverse outputs from a single strong LLM. This avoids the common problem. The problem of balancing model quality and diversity in Mixture of Agents.
📌 Self-MoA-Seq's sliding window is like iterative refinement. Each window pass refines the previous best output. This overcomes context length limitations. This is efficient.
📌 Self-MoA's aggregator implicitly performs "soft voting." It favors common elements across multiple generated outputs. This combines the strengths, like an ensemble but without needing separate models.
----------
Methods Explored in this Paper 🔧:
→ Self-MoA generates multiple outputs from one top-performing LLM. It then aggregates these outputs.
→ This uses the "in-model diversity". The randomness from repeatedly sampling from a single model. Mixed-MoA uses "cross-model diversity".
→ Self-MoA-Seq, a sequential version is introduced. Self-MoA-Seq addresses context length limits. It uses a sliding window approach.
-----
Key Insights 💡:
→ The quality of proposer LLMs matter. Diversity alone isn't enough for Mixture-of-Agents.
→ Using one high-quality LLM with low diversity is often superior.
→ Self-MoA-Seq uses windowing to overcome context length constraints. It, therefore, maintains performance.
-----
Results 📊:
→ Self-MoA improves win rate by 6.6 points. This is better than Mixed-MoA on AlpacaEval 2.0 (length-controlled).
→ Self-MoA gets state-of-the-art results. It achieves 78.5 and 75.0 on the AlpacaEval 2.0 leaderboard. It uses top models for both proposing and aggregating.
→ On mixture tasks, Self-MoA has average accuracy of 63.81. This slightly beats the best Mixed-MoA model (60.04).