0:00
/
0:00
Transcript

"rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking"

Generated below podcast on this paper with Google's Illuminate.

Github now available for that classic rStar-Math paper from - @Microsoft

---------

This technique upgrades small models to outperform OpenAI’s o1-preview at math problems 🤯

By using Monte Carlo Tree Search (MCTS) and self-evolution strategies.

Applied to models like Qwen-7B and Phi3-mini, it surpassed OpenAI’s o1-preview on key benchmarks, such as improving Qwen2.5-Math-7B’s accuracy on the MATH dataset from 58.8% to 90.0%.

---

https://arxiv.org/abs/2501.04519

→ MCTS-driven reasoning: The core of rStar-Math is Monte Carlo Tree Search (MCTS), which simulates step-by-step reasoning paths for more accurate intermediate steps, mimicking human “deep thinking.”

→ Code-augmented reasoning: Each math reasoning step outputs both a natural language explanation and its corresponding Python code. Validated Python outputs filter out incorrect steps, boosting accuracy by retaining only executable solutions.

→ Self-evolution: The training involved four rounds where the policy model and the Process Preference Model (PPM) improve each other iteratively. This led to a significant rise in benchmark performance.

→ Policy and PPM training: The process reward model (PPM) uses step-by-step "Q-value" comparisons to improve reasoning step evaluations without manual labeling. It refines trajectory selections and enforces preference-based training for consistent improvements.

→ Data synthesis: The researchers created a massive dataset of 747,000 math word problems and refined solution steps via MCTS rollouts, ensuring that only high-quality, verified solution paths were used for training.

-----

📊 Results:

→ Benchmark achievements: On the American Invitational Mathematics Examination (AIME), the system solved 53.3% of problems, placing in the top 20% of high school competitors. Across datasets like MATH, Olympiad Bench, and GSM8K, it outperformed multiple baselines, including larger models.

→ Improves Qwen2.5-Math-7B from 58.8% to 90.0% on MATH benchmark

→ Boosts Phi3-mini-3.8B from 41.4% to 86.4%

→ Solves 53.3% of USA Math Olympiad problems

→ Ranks among top 20% of high school math students

Discussion about this video