Current retrieval methods primarily focus on surface-level similarities of textual or visual cues in trajectories, neglecting their effectiveness for the specific task at hand.
MLLM AS RETRIEVER (MART) solves this by leveraging interactive feedback to retrieve task-effective multimodal trajectories.
https://arxiv.org/abs/2410.03450
Results 📊:
• MART consistently outperforms baselines by over 10% in Success Rate across environments
• AI2-THOR: 40% Success Rate (vs 18-26% for baselines)
• LEGENT: 87% Success Rate (vs 69-75% for other methods)
• Improved Average Steps needed to complete tasks in both environments
Solution in this Paper 🛠️:
• MART (MLLM As ReTriever): Fine-tunes MLLM retriever using interactive feedback
• Trajectory Abstraction: Condenses trajectories while preserving key information
• Preference learning: Organizes interactive feedback into pairs for fine-tuning
• Bradley-Terry model: Used to train the MLLM retriever
-----
Key Insights from this Paper 💡:
• Interactive learning improves trajectory retrieval for embodied agents
• Trajectory Abstraction reduces context window length and removes distracting information
• MLLM retriever can prioritize effective trajectories for unseen tasks
• Combining MLLM capabilities with task-specific effectiveness assessment enhances performance
Share this post