Beautiful @GoogleDeepMind paper.
Evolve smarter, not harder: Mind Evolution boosts LLM planning.
From trial-and-error to evolution:
Mind Evolution uses evolutionary search to improve LLM problem-solving. It evolves solutions by generating, recombining, and refining them based on evaluator feedback.
Mind Evolution shows how survival-of-the-fittest makes AI problem-solvers.
→ Think of it like training a team of LLMs together. Initially, you have many different LLM-generated answers (a population). The best answers are kept and 'mixed' together (recombined) to create new and hopefully better answers for the next round. This process repeats, generation after generation, with only the best solutions 'surviving' and getting refined, just like in natural evolution, leading to increasingly better solutions over time.
-----
Paper - https://arxiv.org/abs/2501.09891
Original Problem 🤔:
→ LLMs struggle with complex reasoning, especially when constraints and preferences are expressed in natural language.
→ Existing methods like Best-of-N or sequential revision are not sufficient.
→ They lack the ability to efficiently explore and refine solutions in a complex natural language space.
-----
Solution in this Paper 💡:
→ Mind Evolution uses a language-based genetic algorithm.
→ It evolves a population of candidate solutions in natural language.
→ An LLM generates, recombines (crossover and mutation), and refines solutions.
→ An evaluator provides feedback, guiding the search towards better solutions.
→ It uses an island model to maintain diversity, with migration and island reset operations.
→ Refinement is done through a "critical conversation" between "critic" and "author" LLM roles.
-----
Key Insights from this Paper 🔑:
→ Combining divergent and convergent thinking improves problem-solving.
→ Natural language representation allows leveraging LLM's strength in understanding and generation.
→ Programmatic evaluators can effectively guide search, even without formal problem definitions.
-----
Results 💯:
→ Solves over 95% of TravelPlanner and over 83% Meeting Planning problems using Gemini 1.5 Flash.
→ Achieves near-perfect performance (almost 100%) with a two-stage approach using Gemini 1.5 Pro.
→ Significantly outperforms Best-of-N and Sequential Revision on these benchmarks.
Share this post