Interpretable Contrastive Monte Carlo Tree Search Reasoning

Playback speed

Share post at current time

Share from 0:00

0:00

Transcript

Interpretable Contrastive Monte Carlo Tree Search Reasoning

Generated this podcast with Google's Illuminate.

Rohan Paul

Dec 28, 2024

Transcript

When LLMs disagree, they actually reason better - and now 51.9% faster too, making small LLMs think as deeply as the big ones by learning from their own mistakes.

Lets smaller LLMs match the reasoning capabilities of models 5x their size

With it, LLMs can now explore reasoning paths 51.9% faster by combining expert-amateur model disagreement with speculative search

📚 https://arxiv.org/abs/2410.01707

Original Problem 🎯:

MCTS reasoning in LLMs faces three key challenges: slow speed compared to Chain of Thought (CoT), dependency on complex reward models requiring multiple LLMs, and limited analysis of MCTS components from an interpretability perspective.

-----

Solution in this Paper 🔧:

• Introduced SC-MCTS* (Speculative Contrastive Monte Carlo Tree Search) with three core components:

- Novel contrastive reward model using expert/amateur model divergence

- Statistical method to combine multiple reward functions

- Speculative decoding integration for 51.9% speed improvement

• Key mechanisms:

- Action-level Jensen-Shannon divergence between expert/amateur models

- Multi-RM method for normalizing rewards across different modes

- Refined UCT strategy with optimized exploration constant

- Enhanced backpropagation favoring steadily improving paths

-----