Data preference-based curriculum learning improves LLM efficiency and accuracy.
This paper introduces a novel training paradigm for LLMs, where training data is dynamically selected based on the model's evolving preferences, leading to significant performance gains. LLMs are typically pretrained on a uniform data distribution, ignoring the fact that a modelβs data preference changes as its capabilities evolve during training.
Paper - https://arxiv.org/abs/2501.13126
Original Problem π€:
β Current LLMs are pretrained on static data distributions, which is suboptimal as the model's learning capacity changes during training.
Solution in this Paper π οΈ:
β The Perplexity Difference based Preference Curriculum learning (PDPC) framework dynamically arranges the training data based on the model's preference.
β PD is calculated offline using reference models, reducing computational overhead.
β An S-shaped preference function guides the concentration of low-PD data during training, ensuring smooth curriculum progression.
β The training data is arranged offline ensuring continuous training.
Key Insights π‘:
β Model's perplexity difference (PD) between early and late checkpoints reflects sample difficulty and preference shift during training.
β High-PD data is beneficial in later training stages, while low-PD data suits earlier stages, creating a natural curriculum.
Results π―:
β 3B model trained with PDPC using 1 trillion tokens achieves an average accuracy increase of 4.1% across benchmarks, and 8.1% on MMLU and CMMLU.
β 1.3B model also shows consistent improvements on all benchmarks.