Active retrieval meets Monte Carlo Tree Search: Teaching MLLMs to think better by retrieving knowledge at each reasoning step.
AR-MCTS framework enhances multimodal reasoning in LLMs by combining active retrieval with Monte Carlo Tree Search, enabling automated step-wise verification without human annotation.
-----
https://arxiv.org/abs/2412.14835
🤔 Original Problem:
MLLMs struggle with complex multi-step multimodal reasoning tasks. Current verification methods require extensive human annotation and lack reliable automated processes for validating reasoning steps.
-----
🔧 Solution in this Paper:
→ AR-MCTS introduces active retrieval at each reasoning step, replacing traditional beam search sampling with dynamically retrieved knowledge.
→ A unified retrieval module extracts insights from a hybrid-modal corpus combining mathematics-specific and general reasoning knowledge.
→ Knowledge concept filtering ensures consistency between retrieved information and problem domains.
→ MCTS algorithm with active retrieval generates automated step-wise annotations for process reward modeling.
→ Progressive alignment of process reward model through curriculum learning enables automated verification.
-----
💡 Key Insights:
→ Active retrieval at each reasoning step significantly improves sampling diversity and accuracy
→ Process reward models outperform outcome reward models in complex reasoning tasks
→ Smaller MLLMs show greater improvement with AR-MCTS compared to larger models
→ Knowledge concept filtering reduces noise in retrieved information
-----
📊 Results:
→ Improved performance across various MLLMs on MathVista (ALL): Qwen2-VL-7B (+5.3%), InternVL2-8B (+5.8%)
→ Enhanced complex reasoning in We-Math (S3): GPT-4o (56.4% vs 50.3%), Qwen2-VL (40.6% vs 34.6%)
→ Significant gains in GAOKAO-MM: GPT-4o (+6.6%), Qwen2-VL-7B (+7.2%)
------
Are you into AI and LLMs❓ Join my daily AI newsletter. I will send you 7 emails a week analyzing the highest signal AI developments. ↓↓
🎉 https://rohanpaul.substack.com/
Share this post