0:00
/
0:00
Transcript

"A Large Recurrent Action Model: xLSTM enables Fast Inference for Robotics Tasks"

The podcast on this paper is generated with Google's Illuminate.

LRAM, proposed in this paper, brings Transformer-level performance to robotics with 100x faster inference using xLSTM

LRAM brings the speed of LSTM with the brains of a Transformer

https://arxiv.org/abs/2410.22391

🎯 Original Problem:

Transformer-based models in robotics face slow inference times, making them impractical for real-time applications requiring 100-1000Hz sampling rates. This creates a critical bottleneck for robotic control systems needing response times under 10ms.

-----

🔧 Solution in this Paper:

The researchers developed LRAM (Large Recurrent Action Model) using xLSTM at its core. LRAM processes multi-modal inputs through separate encoders (CNN for images, fully connected networks for low-dimensional inputs) and uses a shared action head for all predictions. The model maintains hidden states during inference for linear-time processing, making it significantly faster than quadratic-time Transformer models.

-----

💡 Key Insights:

→ Modern recurrent architectures can match or exceed Transformer performance while being faster

→ Removing actions from input context improves performance in robotics domains

→ xLSTM shows better domain separation in embedding space compared to Transformers

→ Linear-time inference enables real-time robotics applications

-----

📊 Results:

→ Tested across 432 tasks from 6 domains with 894M total transitions

→ xLSTM outperformed Transformers in both validation perplexity and normalized performance scores

→ At 206M parameters, xLSTM performed better than Mamba architecture

→ Achieved sub-10ms inference times required for real-time robotics control

Discussion about this video