"Progressive Multimodal Reasoning via Active Retrieval"

Playback speed

Share post at current time

0:00

Transcript

"Progressive Multimodal Reasoning via Active Retrieval"

Generated below podcast on this paper with Google's Illuminate.

Rohan Paul

Jan 13, 2025

Active retrieval meets Monte Carlo Tree Search: Teaching MLLMs to think better by retrieving knowledge at each reasoning step.

AR-MCTS framework enhances multimodal reasoning in LLMs by combining active retrieval with Monte Carlo Tree Search, enabling automated step-wise verification without human annotation.

-----

https://arxiv.org/abs/2412.14835

🤔 Original Problem:

MLLMs struggle with complex multi-step multimodal reasoning tasks. Current verification methods require extensive human annotation and lack reliable automated processes for validating reasoning steps.

-----

🔧 Solution in this Paper:

→ AR-MCTS introduces active retrieval at each reasoning step, replacing traditional beam search sampling with dynamically retrieved knowledge.

→ A unified retrieval module extracts insights from a hybrid-modal corpus combining mathematics-specific and general reasoning knowledge.

→ Knowledge concept filtering ensures consistency between retrieved information and problem domains.

→ MCTS algorithm with active retrieval generates automated step-wise annotations for process reward modeling.

→ Progressive alignment of process reward model through curriculum learning enables automated verification.

-----

💡 Key Insights:

→ Active retrieval at each reasoning step significantly improves sampling diversity and accuracy

→ Process reward models outperform outcome reward models in complex reasoning tasks

→ Smaller MLLMs show greater improvement with AR-MCTS compared to larger models

→ Knowledge concept filtering reduces noise in retrieved information

-----

📊 Results:

→ Improved performance across various MLLMs on MathVista (ALL): Qwen2-VL-7B (+5.3%), InternVL2-8B (+5.8%)

→ Enhanced complex reasoning in We-Math (S3): GPT-4o (56.4% vs 50.3%), Qwen2-VL (40.6% vs 34.6%)

→ Significant gains in GAOKAO-MM: GPT-4o (+6.6%), Qwen2-VL-7B (+7.2%)

------

Are you into AI and LLMs❓ Join my daily AI newsletter. I will send you 7 emails a week analyzing the highest signal AI developments. ↓↓

🎉 https://rohanpaul.substack.com/

Rohan's Bytes

"Progressive Multimodal Reasoning via Active Retrieval"

Discussion about this video