Enhanced Transformer architecture for in-context learning of dynamical systems

Split long sequences into bite-sized chunks, and suddenly your Transformer can eat 100x more data

Nov 06, 2024

Split long sequences into bite-sized chunks, and suddenly your Transformer can eat 100x more data

Enhanced Transformer architecture with Smart sequence splitting for in-context learning of dynamical systems

Original Problem 🔍:

In the meta-learning settings, a series of related tasks are presented to an agent which adapts its behavior to act optimally with respect to that class of tasks.

In-context identification paradigm for estimating meta-models describing classes of dynamical systems faced limitations in handling long context sequences, providing uncertainty estimates, and managing non-contiguous contexts.

Solution in this Paper 🛠️:

• Probabilistic framework: Outputs mean and standard deviation of predicted outputs

• Non-contiguous context handling: Processes arbitrary initial conditions for query sequences

• Recurrent patching: Splits long context sequences into patches processed by RNN

• Architecture changes:

Modified decoder output layer for mean and standard deviation
Additional layer for handling initial conditions
Context split into patches, processed by RNN before encoder

Key Insights from this Paper 💡:

• Transformer-based meta-models can be adapted for system identification tasks

• Probabilistic outputs enable uncertainty quantification in predictions

• Recurrent patching allows processing of significantly longer context sequences

• Fine-tuning improves performance on out-of-distribution inputs

Results 📊:

• Context length increased from 400 to 40,000 samples

• RMSE approaches noise floor (0.1) with longer contexts

Rohan's Bytes

Discussion about this post