When LLMs learn from context, it's survival of the fittest algorithms.
This paper reveals how In-Context Learning emerges from competing algorithmic behaviors rather than a single mechanism, using a synthetic sequence modeling task involving Markov chains.
https://arxiv.org/abs/2412.01003
🤖 Original Problem:
→ Current understanding of In-Context Learning (ICL) in LLMs lacks a unified framework, with different studies using disparate experimental setups that make it hard to develop general insights.
📝 Solution in this Paper:
→ The researchers introduce a synthetic sequence modeling task where models learn to simulate finite mixture of Markov chains.
→ They identify four distinct algorithmic solutions: Unigram Retrieval, Bigram Retrieval, Unigram Inference, and Bigram Inference.
→ These algorithms compete dynamically, with experimental conditions determining which algorithm dominates.
→ The model's behavior can be decomposed into a linear combination of these algorithms, with weights evolving during training.
💡 Key Insights:
→ ICL emerges from competing algorithmic behaviors rather than a single mechanism
→ Universal claims about ICL may be infeasible since behavior depends heavily on experimental setup
→ Model development should focus on promoting desired algorithms over competing alternatives
📊 Results:
→ Successfully reproduced most known ICL phenomena in a unified setting
→ Achieved near-zero KL divergence when decomposing model behavior into linear combination of algorithms
→ Demonstrated clear transitions between algorithms as function of data diversity and training steps
Share this post