Teacher-student framework enhances mathematical reasoning in smaller language models.
With Hierarchical thought templates and cross-model DPO
📚 https://arxiv.org/abs/2410.09008
Original Problem 🔍:
Smaller LLMs struggle with complex mathematical reasoning due to inability to detect and fix errors.
-----
Solution in this Paper 🧠:
• SUPERCORRECT: Two-stage framework using large teacher model to supervise smaller student model
• Stage 1: Hierarchical thought-based supervised fine-tuning (HSFT)
- Extracts high-level and detailed thought templates from teacher
- Guides student to produce fine-grained reasoning thoughts
• Stage 2: Cross-model collaborative direct preference optimization (DPO)
- Enhances student's self-correction using teacher's correction traces
- Teaches student to locate and resolve errors using teacher's insights
-----
Key Insights from this Paper 💡:
• Leverages teacher model to improve both reasoning and reflection in student model
• Hierarchical thought templates enable more precise reasoning
• Cross-model DPO allows student to break thought bottlenecks and acquire new skills
• Addresses limitations of self-reflection methods where models struggle to identify errors independently
-----
Results 📊:
• Surpasses DeepSeekMath-7B by 7.8%/5.3% on MATH/GSM8K benchmarks
• Outperforms Qwen2.5-Math-7B by 15.1%/6.3% on MATH/GSM8K benchmarks
• Achieves new state-of-the-art performance among all 7B models
• SUPERCORRECT-Qwen-7B: 70.2% accuracy on MATH, 89.5% on GSM8K
Share this post