"Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search"

Playback speed

Share post at current time

0:00

Transcript

"Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search"

Generated below podcast on this paper with Google's Illuminate.

Rohan Paul

Feb 10, 2025

This paper addresses the challenge of Multimodal LLMs (MLLMs) struggling with complex reasoning tasks due to a lack of explicit intermediate reasoning steps.

Current MLLMs often provide direct predictions without detailed reasoning processes.

This paper introduces Collective Monte Carlo Tree Search (CoMCTS). CoMCTS uses collective knowledge from multiple models to effectively search for and learn reasoning paths.

-----

https://arxiv.org/abs/2412.18319

📌 CoMCTS cleverly uses diverse models as an ensemble. This enhances Monte Carlo Tree Search exploration by preventing premature convergence in a single model's limited reasoning space.

📌 Iterative refinement via CoMCTS significantly boosts search efficiency. Error positioning prunes unpromising paths early. This reduces computational cost for complex MLLM reasoning tasks.

📌 Mulberry-260k dataset is a valuable contribution. It offers explicit reasoning paths for MLLM training. This dataset enables models to learn interpretable, step-by-step reasoning.

----------

Methods Explored in this Paper 🔧:

→ The paper proposes Collective Monte Carlo Tree Search (CoMCTS) to improve reasoning in MLLMs.

→ CoMCTS iteratively searches for effective reasoning paths by using multiple MLLMs.

→ It involves four key operations: Expansion, Simulation and Error Positioning, Backpropagation, and Selection.

→ Expansion uses multiple MLLMs to generate diverse candidate reasoning paths.

→ Simulation and Error Positioning evaluate these paths, filtering out incorrect ones using collective knowledge from different models.

→ Backpropagation updates the scores of reasoning nodes based on the simulation results.

→ Selection chooses the most promising reasoning node for the next iteration using Upper Confidence Bound (UCB).

→ CoMCTS constructs a reasoning tree and uses it to train MLLMs to perform step-by-step reasoning and reflection.

→ The method also extends to reflective reasoning by incorporating negative reasoning nodes into the training data.

-----

Key Insights 💡:

→ Collective learning significantly enhances the effectiveness and efficiency of tree search for MLLM reasoning.

→ Joint expansion in CoMCTS allows exploration of a broader reasoning space, avoiding getting stuck in low-quality reasoning paths of a single model.

→ Joint simulation and error positioning enables faster search by skipping intermediate steps and identifying errors more effectively using multiple perspectives.

→ Learning from both effective and reflective reasoning paths improves the MLLM's ability to reason and self-correct.

-----

Results 📊:

→ CoMCTS achieves a search success rate of 80.2%, significantly outperforming traditional MCTS methods and GPT4o direct prediction.

→ CoMCTS reduces the average search iteration to 12.7, showing higher efficiency compared to other tree search methods like MCTS (42.1 iterations).

→ Mulberry-LLaVA-8B, trained with CoMCTS data, improves accuracy by 11.0% on average across 8 benchmarks compared to the baseline LLaVA-NeXT-8B.

→ Mulberry-7B shows a 4.2% average improvement over the baseline Qwen2-VL-7B on the same benchmarks.

→ Mulberry models achieve competitive or superior performance compared to other open-source MLLMs and even some closed-source models on various reasoning benchmarks.

Rohan's Bytes

"Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search"

Discussion about this video