LLMs demonstrate ability to process multiple in-context learning tasks in a single inference pass.
Reveals LLMs' capacity for task superposition
📚 https://arxiv.org/abs/2410.05603
Original Problem 🔍:
LLMs demonstrate remarkable in-context learning capabilities, but their ability to perform multiple distinct tasks simultaneously during inference remains unexplored.
-----
Solution in this Paper 🛠️:
• Offers theoretical construction showing Transformers can implement multiple tasks in parallel
• Explores internal composition of task vectors during superposition
• Investigates how model scale affects task superposition capabilities
-----
Key Insights from this Paper 💡:
• LLMs can perform task superposition even when trained on one task at a time
• Larger models solve more tasks in parallel and better calibrate to in-context distribution
• Task vectors of mixed tasks correlate with individual task vectors and example distribution
• Convex combinations of task vectors reproduce task superposition effect
• "Generation collapse" limits practical applications, highlighting need for new decoding strategies
-----
Results 📊:
• Demonstrated task superposition across GPT-3.5, Llama-3, and Qwen model families
• Larger models (e.g., Qwen-1.5 14B) show higher task completion rates and lower KL divergence
• Task vector interpolation produces task superposition but with higher irrelevant output probabilities
• 7-layer Transformer with O(d + log(mn)) embedding dimension can perform K tasks in superposition
Share this post