Less is more: This paper optimizes o1-like LLMs for efficient and accurate reasoning.
This paper tackles the inefficiency of OpenAI's o1-like LLMs, which often overthink simple problems, using excessive tokens without significant accuracy gains.
🤔:
https://arxiv.org/abs/2412.21187
Original Problem:
→ o1-like LLMs, while powerful, often overuse computational resources on simple tasks.
→ This "overthinking" involves generating many solutions, even when the first one is correct.
💡:
Solution in this Paper:
→ The paper introduces metrics to quantify outcome and process efficiency of o1-like LLMs.
→ It uses a self-training approach with a focus on shorter, more diverse solutions.
→ Redundant solutions are pruned while preserving core reasoning steps.
→ Several post-training methods are explored, including Supervised Fine-tuning (SFT), Direct Preference Optimization (DPO), Reasoning Preference Optimization (RPO), and Simple Preference Optimization (SimPO).
→ The paper also proposes response simplification strategies: First-Correct Solutions (FCS), FCS with Reflection (FCS+Ref), and Greedily Diverse Solutions (GDS).
🤯:
Key Insights from this Paper:
→ Earlier solutions in o1-like LLM responses are often sufficient for correct answers.
→ Later solutions primarily add redundancy, not accuracy or diverse perspectives.
→ Overthinking is more pronounced with easier problems.
📊:
Results:
→ Reduces token output by 48.6% on MATH500 while maintaining accuracy.
→ Improves accuracy for the easiest level of MATH500 from 97.7% to 100% while using only 63.6% of the tokens.
→ Maintains model performance with fewer tokens on challenging datasets like GPQA and AIME.
Share this post