Simplified language training makes small AI models perform better than bigger ones.
This paper introduces a simpler language environment for training tiny language models, making them learn more efficiently with less data and computational resources.
-----
https://arxiv.org/abs/2501.00522v1
Original Problem 🤔:
Current approaches don't effectively utilize simplified language environments for efficient learning.
-----
Solution in this Paper 🔧:
→ Creates LEANER datasets by simplifying complex text while preserving core linguistic patterns
→ Implements a "no noise, low complexity" principle to transform training data into cleaner versions
→ Develops a 71M token LEANER-Pretrain dataset and 7M LEANER-Instruct dataset
→ Introduces LEANER-GLUE for testing linguistic abilities and LEANER-Eval for instruction-following
→ Uses curriculum learning to gradually increase complexity during training
-----
Key Insights 💡:
→ Models trained on LEANER datasets outperform those trained on larger original datasets
→ XLNet architecture performs best in pre-training, while LLAMA excels in fine-tuning
→ Curriculum learning with LM perplexity saves 20% training steps and data
→ 71M tokens insufficient for robust instruction-following capabilities
-----
Results 📊:
→ LEANER pre-training improves model performance despite 41% smaller dataset size
→ Architecture ranking (pre-training): XLNet > BERT > LLAMA > MAMBA
→ Architecture ranking (fine-tuning): LLAMA > XLNet > MAMBA > BERT
→ Curriculum learning reduces training steps by 20% while maintaining performance
Share this post