LRAM, proposed in this paper, brings Transformer-level performance to robotics with 100x faster inference using xLSTM
LRAM brings the speed of LSTM with the brains of a Transformer
https://arxiv.org/abs/2410.22391
🎯 Original Problem:
Transformer-based models in robotics face slow inference times, making them impractical for real-time applications requiring 100-1000Hz sampling rates. This creates a critical bottleneck for robotic control systems needing response times under 10ms.
-----
🔧 Solution in this Paper:
The researchers developed LRAM (Large Recurrent Action Model) using xLSTM at its core. LRAM processes multi-modal inputs through separate encoders (CNN for images, fully connected networks for low-dimensional inputs) and uses a shared action head for all predictions. The model maintains hidden states during inference for linear-time processing, making it significantly faster than quadratic-time Transformer models.
-----
💡 Key Insights:
→ Modern recurrent architectures can match or exceed Transformer performance while being faster
→ Removing actions from input context improves performance in robotics domains
→ xLSTM shows better domain separation in embedding space compared to Transformers
→ Linear-time inference enables real-time robotics applications
-----
📊 Results:
→ Tested across 432 tasks from 6 domains with 894M total transitions
→ xLSTM outperformed Transformers in both validation perplexity and normalized performance scores
→ At 206M parameters, xLSTM performed better than Mamba architecture
→ Achieved sub-10ms inference times required for real-time robotics control
Share this post