Complexity exposure drives intelligence in LLMs, with optimal performance at the "edge of chaos."
Simple systems with complex behaviors can foster intelligence in language models.
-----
📚 https://www.arxiv.org/pdf/2410.02536
Solution in this Paper 🧠:
• Train GPT-2 models on Elementary Cellular Automata (ECA) data of varying complexity
• Evaluate models on downstream tasks: easy/hard reasoning and chess move prediction
• Analyze attention patterns to understand information processing strategies
• Measure rule complexity using Lempel-Ziv, compression, Lyapunov, and Krylov metrics
-----
Key Insights from this Paper 💡:
• Intelligence may emerge from exposure to complexity, even in simple rule-based systems
• Optimal "edge of chaos" complexity fosters intelligent behavior
• Models trained on complex rules develop more sophisticated processing strategies
• Overparameterized models can learn complex solutions for simple problems
-----
Results 📊:
• Positive correlation between rule complexity and downstream task performance
• Models trained on Class III/IV rules outperform those trained on Class I/II rules
• Attention analysis: Complex rule models integrate more historical information
• CKA similarity: Models trained on similar complexity rules cluster together
• Short-term prediction models outperform long-term prediction models
Share this post