Game theory meets LLM alignment: Faster training through strategic optimization.
WIND (WIN rate Dominance) accelerates LLM alignment by optimizing win rates through game theory
📚 https://arxiv.org/abs/2410.20727
🎯 Original Problem:
Best-of-N (BoN) distillation is effective for LLM alignment but computationally expensive due to high sample and inference costs during iterative training. The challenge is to maintain performance while reducing these costs.
-----
🔧 Solution in this Paper:
• Introduces WIND (WIN rate Dominance) framework connecting iterative BoN with game theory
• Provides memory-efficient exact policy optimization with linear convergence
• Implements flexible loss functions (squared, KL-divergence, NCE) for win rate optimization
• Establishes theoretical guarantees through game-theoretic analysis
• Reduces computational overhead by optimizing sampling strategy
-----
💡 Key Insights:
• Iterative BoN is equivalent to solving a log-win-rate game
• Game theory unifies seemingly different algorithmic paradigms
• Win rate dominance policy approximates iterative BoN's limiting point
• Sample efficiency improves through strategic sampling methods
-----
📊 Results:
• Achieves 77.18% on GSM8k (vs SPPO's 75.44%)
• Scores 79.31% on HellaSwag benchmark
• Attains 65.87% on MMLU evaluation
• Reaches 8.2013 average score on MT-Bench
• Reduces computation time by ~40% compared to baseline methods
Share this post