Noise-based expert routing slashes computational costs while boosting neural-net learning performance.
A novel mixture-of-experts architecture for diffusion policies reduces computational costs by 90% while improving performance through specialized noise-level routing and expert caching.
-----
https://arxiv.org/abs/2412.12953
🤖 Original Problem:
→ Diffusion policies for robotic learning are computationally expensive, requiring hundreds of millions of parameters and many denoising steps.
→ This limits their use in real-time robotics applications with limited computing resources.
-----
🔧 Solution in this Paper:
→ Introduces Mixture-of-Denoising Experts (MoDE), combining transformer architecture with sparse experts and noise-conditioned routing.
→ Uses specialized experts for different noise levels in the denoising process.
→ Implements noise-based expert caching to reduce inference costs by 90%.
→ Features load balancing to prevent expert collapse and ensure efficient parameter usage.
→ Employs noise-conditioned self-attention for enhanced denoising across different noise levels.
-----
🎯 Key Insights:
→ Noise-conditioned routing enables efficient parameter scaling while maintaining performance
→ Expert specialization across noise levels improves denoising effectiveness
→ Caching predicted expert paths significantly reduces computational overhead
-----
📊 Results:
→ Achieved 4.01 score on CALVIN ABC and 0.95 on LIBERO-90 benchmarks
→ Surpassed previous policies by 57% while using 90% fewer FLOPs
→ Reduced active parameters by 40% compared to standard architectures
Share this post