LLMs learn better when fed a mix of easy and hard examples, just like humans.
By blending confident and uncertain examples, SFTMix enhances LLM learning without premium datasets.
It does this by exploiting dataset confidence variations, intelligently combining training examples based on their confidence levels
📚 https://arxiv.org/abs/2410.05248
Original Problem 🔍:
Instruction-tuning LLMs often relies on expensive, well-curated datasets. Existing approaches lack efficient data utilization and insightful understanding of datasets, limiting scalability and performance gains.
-----
Solution in this Paper 🧠:
• SFTMix: A novel instruction-tuning recipe
• Leverages training dynamics to identify confident vs unconfident examples
• Applies Mixup-based regularization during instruction tuning
• Linearly interpolates representations and one-hot encodings between confident/unconfident pairs
• Adds regularization term to standard next-token prediction loss:
-----
Key Insights from this Paper 💡:
• LLMs exhibit uneven confidence across semantic space
• Training dynamics crucial for effective Mixup
• Mixup most effective as regularization alongside NTP loss
• Generalizes from weaker to stronger LLMs
• Improves performance without needing higher quality datasets
-----
Results 📊:
• Outperforms NTP across LLM families (Llama, Mistral) and dataset sizes
• MT-Bench: 0.2-0.3 point increase in single/multi-turn tasks
• AlpacaEval-2: Up to 1.7% increase in length-controlled win rate
• Healthcare tasks: 1.5% absolute increase in accuracy across 4 benchmarks
• Significant gains in extraction, writing, coding, and STEM categories
Share this post