"Rethinking Mixture-of-Agents: Is Mixing Different LLMs Beneficial?"

Below podcast on this paper is generated with Google's Illuminate.

Feb 08, 2025

→ Mixture-of-Agents systems use multiple different LLMs to improve output.

→ However, balancing the quality and diversity of these LLMs is hard. Including weaker models can reduce overall performance.

This paper proposes Self-MoA which aggregates outputs from a single, high-performing LLM, leveraging inherent randomness. It outperforms methods that rely on diverse models.

-----

https://www.arxiv.org/abs/2502.00674

📌 Self-MoA cleverly uses stochastic decoding. It generates diverse outputs from a single strong LLM. This avoids the common problem. The problem of balancing model quality and diversity in Mixture of Agents.

📌 Self-MoA-Seq's sliding window is like iterative refinement. Each window pass refines the previous best output. This overcomes context length limitations. This is efficient.

📌 Self-MoA's aggregator implicitly performs "soft voting." It favors common elements across multiple generated outputs. This combines the strengths, like an ensemble but without needing separate models.

----------

Methods Explored in this Paper 🔧:

→ Self-MoA generates multiple outputs from one top-performing LLM. It then aggregates these outputs.

→ This uses the "in-model diversity". The randomness from repeatedly sampling from a single model. Mixed-MoA uses "cross-model diversity".

→ Self-MoA-Seq, a sequential version is introduced. Self-MoA-Seq addresses context length limits. It uses a sliding window approach.

-----

Key Insights 💡:

→ The quality of proposer LLMs matter. Diversity alone isn't enough for Mixture-of-Agents.

→ Using one high-quality LLM with low diversity is often superior.

→ Self-MoA-Seq uses windowing to overcome context length constraints. It, therefore, maintains performance.

-----

Results 📊:

→ Self-MoA improves win rate by 6.6 points. This is better than Mixed-MoA on AlpacaEval 2.0 (length-controlled).

→ Self-MoA gets state-of-the-art results. It achieves 78.5 and 75.0 on the AlpacaEval 2.0 leaderboard. It uses top models for both proposing and aggregating.

→ On mixture tasks, Self-MoA has average accuracy of 63.81. This slightly beats the best Mixed-MoA model (60.04).

Rohan's Bytes

Discussion about this post