SuperCorrect: Supervising and Correcting Language Models with Error-Driven Insights

Playback speed

Share post at current time

0:00

Transcript

The podcast on this paper is generated with Google's Illuminate.

Jan 03, 2025

Teacher-student framework enhances mathematical reasoning in smaller language models.

With Hierarchical thought templates and cross-model DPO

Original Problem 🔍:

Smaller LLMs struggle with complex mathematical reasoning due to inability to detect and fix errors.

-----

Solution in this Paper 🧠:

• SUPERCORRECT: Two-stage framework using large teacher model to supervise smaller student model

• Stage 1: Hierarchical thought-based supervised fine-tuning (HSFT)

- Extracts high-level and detailed thought templates from teacher

- Guides student to produce fine-grained reasoning thoughts

• Stage 2: Cross-model collaborative direct preference optimization (DPO)

- Enhances student's self-correction using teacher's correction traces

- Teaches student to locate and resolve errors using teacher's insights

-----

Key Insights from this Paper 💡:

• Leverages teacher model to improve both reasoning and reflection in student model

• Hierarchical thought templates enable more precise reasoning

• Cross-model DPO allows student to break thought bottlenecks and acquire new skills

• Addresses limitations of self-reflection methods where models struggle to identify errors independently

-----

Results 📊:

• Surpasses DeepSeekMath-7B by 7.8%/5.3% on MATH/GSM8K benchmarks

• Outperforms Qwen2.5-Math-7B by 15.1%/6.3% on MATH/GSM8K benchmarks

• Achieves new state-of-the-art performance among all 7B models

• SUPERCORRECT-Qwen-7B: 70.2% accuracy on MATH, 89.5% on GSM8K

Rohan's Bytes