Resource competition and modularity explain fast skill acquisition in neural networks.
Geometry, resources, and dominos explain how neural networks learn.
This paper proposes simplified models to understand how skills are learned in neural networks, focusing on the sequential nature of skill acquisition.
https://arxiv.org/abs/2501.12391
🤔 Original Problem:
→ LLMs exhibit complex skill learning dynamics, including sequential learning (the "Domino effect") which lacks intuitive understanding.
💡 Solution in this Paper:
→ The paper proposes three simplified models with varying complexity: Geometry, Resource, and Domino.
→ The Geometry model represents tasks as vectors in parameter space.
→ The Resource model interprets model parameters as resources that tasks compete for based on gradient magnitudes.
→ The Domino model simplifies this further, assuming a strict sequential learning order based on task frequency.
🤯 Key Insights from this Paper:
→ The Geometry model explains Chinchilla scaling laws and optimizer behavior.
→ The Resource model explains the dynamics of learning compositional tasks.
→ The Domino model highlights the benefits of modularity for faster scaling.
📈 Results:
→ For two independent sparse parity tasks, the less frequent task starts learning rapidly after the more frequent task finishes, taking only twice as long instead of the expected ten times.
→ The Resource model captures this behavior through gradient magnitude competition.
→ Modular networks, due to parallel learning of skills, demonstrate a speed-up of O(square root of number of tasks) compared to non-modular networks' O(number of tasks) scaling in learning time.
Share this post