O1-Pruner makes LLMs think faster and smarter by trimming redundant reasoning steps.
This paper introduces a fine-tuning method to reduce the inference overhead of long-thought reasoning LLMs while preserving accuracy.
Paper - https://arxiv.org/abs/2501.12570
Original Problem 🤔:
→ Long-thought reasoning LLMs, while effective, have high inference costs due to lengthy outputs.
→ Existing LLMs often produce unnecessarily long reasoning processes, even when shorter, accurate solutions exist.
Solution in this Paper 💡:
→ The paper proposes Length-Harmonizing Fine-Tuning (O1-Pruner).
→ This method aims to minimize reasoning overhead while maintaining accuracy.
→ It estimates baseline performance through pre-sampling.
→ Then, uses reinforcement learning style fine-tuning to encourage shorter reasoning while maintaining accuracy under a specified constraint.
Key Insights from this Paper 🔑:
→ Long-thought reasoning models exhibit length disharmony. This means their output lengths vary significantly, even for problems with similar complexity. Shorter solutions often achieve comparable accuracy.
→ Directly optimizing for shorter reasoning paths while maintaining accuracy constraints leads to more efficient inference and can even improve overall accuracy.
Results 📊:
→ On the MATH dataset, O1-Pruner achieves 77.5% accuracy with Marco-01-7B and 91% with QwQ-32B-Preview.
→ This is while reducing solution length by 40.5% and 34.7%, respectively, compared to the baseline.
→ Inference time for Marco-01-7B reduced to just over 1 minute and to around 4 minutes for QwQ-32B-Preview.
Share this post