Smart prompt evolution beats brute force optimization for teaching LLMs complex math.
This paper introduces Evolutionary Pre-Prompt Optimization (EPPO), a method that enhances mathematical reasoning in LLMs by optimizing Chain-of-Thought prompts using evolutionary algorithms. EPPO achieves 10% better exact match scores on GSM8k and MathQA benchmarks while providing theoretical guarantees against overfitting.
-----
https://arxiv.org/abs/2412.04291
🤔 Original Problem:
LLMs struggle with complex mathematical reasoning tasks despite their size. Current prompt optimization methods lack theoretical guarantees and often overfit on small training datasets.
-----
🔧 Solution in this Paper:
→ EPPO uses evolutionary algorithms to select optimal Chain-of-Thought examples as pre-prompts for mathematical reasoning tasks
→ The method requires only binary comparisons between pre-prompts, enabling information-theoretic generalization bounds
→ EPPO optimizes a small set of 2-16 examples that remain fixed for the entire downstream task
→ The algorithm employs comparison-based optimization to minimize overfitting risks
→ Integration with self-consistency voting further amplifies performance gains
-----
💡 Key Insights:
→ 4-shot prompts perform better than 8-shot prompts due to reduced overfitting
→ Evolutionary optimization outperforms random search for prompt selection
→ Pre-prompts transfer well across different models and mathematical tasks
→ Limited data feedback helps prevent overfitting compared to gradient-based methods
-----
📊 Results:
→ 10% improvement in exact match scores on GSM8k and MathQA
→ 64% prediction accuracy on test sets
→ Sharpe ratio of 2.21 on sector rotation strategy
→ Successful transfer from 7B to 70B parameter models
Share this post