"Efficient Diffusion Transformer Policies with Mixture of Expert Denoisers for Multitask Learning"

Playback speed

Share post at current time

0:00

Transcript

"Efficient Diffusion Transformer Policies with Mixture of Expert Denoisers for Multitask Learning"

Generated below podcast on this paper with Google's Illuminate.

Rohan Paul

Jan 12, 2025

Noise-based expert routing slashes computational costs while boosting neural-net learning performance.

A novel mixture-of-experts architecture for diffusion policies reduces computational costs by 90% while improving performance through specialized noise-level routing and expert caching.

-----

https://arxiv.org/abs/2412.12953

🤖 Original Problem:

→ Diffusion policies for robotic learning are computationally expensive, requiring hundreds of millions of parameters and many denoising steps.

→ This limits their use in real-time robotics applications with limited computing resources.

-----

🔧 Solution in this Paper:

→ Introduces Mixture-of-Denoising Experts (MoDE), combining transformer architecture with sparse experts and noise-conditioned routing.

→ Uses specialized experts for different noise levels in the denoising process.

→ Implements noise-based expert caching to reduce inference costs by 90%.

→ Features load balancing to prevent expert collapse and ensure efficient parameter usage.

→ Employs noise-conditioned self-attention for enhanced denoising across different noise levels.

-----

🎯 Key Insights:

→ Noise-conditioned routing enables efficient parameter scaling while maintaining performance

→ Expert specialization across noise levels improves denoising effectiveness

→ Caching predicted expert paths significantly reduces computational overhead

-----

📊 Results:

→ Achieved 4.01 score on CALVIN ABC and 0.95 on LIBERO-90 benchmarks

→ Surpassed previous policies by 57% while using 90% fewer FLOPs

→ Reduced active parameters by 40% compared to standard architectures

Rohan's Bytes

"Efficient Diffusion Transformer Policies with Mixture of Expert Denoisers for Multitask Learning"

Discussion about this video