Enhanced Transformer architecture for in-context learning of dynamical systems
Split long sequences into bite-sized chunks, and suddenly your Transformer can eat 100x more data
Split long sequences into bite-sized chunks, and suddenly your Transformer can eat 100x more data
Enhanced Transformer architecture with Smart sequence splitting for in-context learning of dynamical systems
Original Problem 🔍:
In the meta-learning settings, a series of related tasks are presented to an agent which adapts its behavior to act optimally with respect to that class of tasks.
In-context identification paradigm for estimating meta-models describing classes of dynamical systems faced limitations in handling long context sequences, providing uncertainty estimates, and managing non-contiguous contexts.
Solution in this Paper 🛠️:
• Probabilistic framework: Outputs mean and standard deviation of predicted outputs
• Non-contiguous context handling: Processes arbitrary initial conditions for query sequences
• Recurrent patching: Splits long context sequences into patches processed by RNN
• Architecture changes:
Modified decoder output layer for mean and standard deviation
Additional layer for handling initial conditions
Context split into patches, processed by RNN before encoder
Key Insights from this Paper 💡:
• Transformer-based meta-models can be adapted for system identification tasks
• Probabilistic outputs enable uncertainty quantification in predictions
• Recurrent patching allows processing of significantly longer context sequences
• Fine-tuning improves performance on out-of-distribution inputs
Results 📊:
• Context length increased from 400 to 40,000 samples
• RMSE approaches noise floor (0.1) with longer contexts