0:00
/
0:00
Transcript

Everything Everywhere All at Once: LLMs can In-Context Learn Multiple Tasks in Superposition

Generated this podcast with Google's Illuminate.

LLMs demonstrate ability to process multiple in-context learning tasks in a single inference pass.

Reveals LLMs' capacity for task superposition

📚 https://arxiv.org/abs/2410.05603

Original Problem 🔍:

LLMs demonstrate remarkable in-context learning capabilities, but their ability to perform multiple distinct tasks simultaneously during inference remains unexplored.

-----

Solution in this Paper 🛠️:

• Offers theoretical construction showing Transformers can implement multiple tasks in parallel

• Explores internal composition of task vectors during superposition

• Investigates how model scale affects task superposition capabilities

-----

Key Insights from this Paper 💡:

• LLMs can perform task superposition even when trained on one task at a time

• Larger models solve more tasks in parallel and better calibrate to in-context distribution

• Task vectors of mixed tasks correlate with individual task vectors and example distribution

• Convex combinations of task vectors reproduce task superposition effect

• "Generation collapse" limits practical applications, highlighting need for new decoding strategies

-----

Results 📊:

• Demonstrated task superposition across GPT-3.5, Llama-3, and Qwen model families

• Larger models (e.g., Qwen-1.5 14B) show higher task completion rates and lower KL divergence

• Task vector interpolation produces task superposition but with higher irrelevant output probabilities

• 7-layer Transformer with O(d + log(mn)) embedding dimension can perform K tasks in superposition

Discussion about this video