Just like in Life, repetition WINS. 🎖️
For Models, the benefits of repetition can outweigh those of data diversity.
Small, repeated datasets unlock superior LLM performance in mathematical tasks. Transformers learn better with strategic repetition: Insights from mathematical tasks.
📚 https://arxiv.org/abs/2410.07041
Original Problem 🔍:
LLMs are typically trained on large datasets with minimal repetition, assuming more diverse data leads to better generalization. This approach may not be optimal for learning efficiency and performance.
-----
Solution in this Paper 🧠:
• Introduces "two-set training" for transformers
• Randomly selects a small subset of training examples for frequent repetition
• Mixes repeated and non-repeated examples in mini-batches
• Experiments with GCD, modular multiplication, and matrix eigenvalue tasks
• Uses sequence-to-sequence transformers with 4 layers, 512 embedding dimension
-----
Key Insights from this Paper 💡:
• Repetition of training examples can improve model performance
• Smaller datasets with more repetitions often outperform larger, single-use datasets
• Two-set training accelerates learning and enhances performance
• Mixing repeated and non-repeated examples in mini-batches is crucial
• The benefits of repetition can outweigh those of data diversity
-----
Results 📊:
• Greatest common divisor (GCD) task: Two-set training achieves 69 correct GCD vs 37 for single-set
• Modular multiplication: Two-set models reach 92% accuracy, single-set models fail to learn
• Matrix eigenvalues: 4-layer models learn tasks typically requiring 8-12 layers
• Consistent improvements across various tasks and model sizes
• Curating or shifting the repeated set shows no significant improvement over random selection
Share this post