0:00
/
0:00
Transcript

"Toward Understanding In-context vs. In-weight Learning"

The podcast on this paper is generated with Google's Illuminate.

Ever wondered why your LLM suddenly got worse at few-shot tasks? Here's why!

This paper mathematically explains why LLMs sometimes forget their in-context learning abilities

📚 https://arxiv.org/abs/2410.23042

🤔 Original Problem:

LLMs show in-context learning (ICL) capabilities, but this ability can diminish with further training. The research community lacks theoretical understanding of why and when ICL emerges or disappears.

-----

🔧 Solution in this Paper:

→ Introduces a bi-level model that uses a gating mechanism to choose between in-weight learning (IWL) and ICL predictors

→ The model learns to select between ICL and IWL based on their expected performance using a gating parameter α

→ Provides mathematical framework showing how simple distributional properties lead to emergence and disappearance of ICL

→ Demonstrates that ICL appears with diverse but rare samples, while IWL dominates for frequently occurring patterns

-----

💡 Key Insights:

→ ICL emerges when data has diverse but rare samples that are predictable from context

→ IWL takes over when model accumulates enough examples of previously rare patterns

→ The choice between ICL and IWL is driven by their relative performance on new data

→ Simple distributional properties can explain complex ICL behaviors

-----

📊 Results:

→ Model shows aligned results between theoretical predictions and transformer behavior on simplified distributions

→ Successfully demonstrates how fine-tuning on various natural language prompts elicits similar ICL and IWL behavior

→ Provides mathematical bounds on errors for both ICL and IWL predictors

→ Validates theory through experiments on synthetic and Omniglot data

Discussion about this video

User's avatar