0:00
/
0:00
Transcript

"Preference Curriculum: LLMs Should Always Be Pretrained on Their Preferred Data"

Below podcast on this paper is generated with Google's Illuminate.

Data preference-based curriculum learning improves LLM efficiency and accuracy.

This paper introduces a novel training paradigm for LLMs, where training data is dynamically selected based on the model's evolving preferences, leading to significant performance gains. LLMs are typically pretrained on a uniform data distribution, ignoring the fact that a model’s data preference changes as its capabilities evolve during training.

Paper - https://arxiv.org/abs/2501.13126

Original Problem πŸ€”:

β†’ Current LLMs are pretrained on static data distributions, which is suboptimal as the model's learning capacity changes during training.

Solution in this Paper πŸ› οΈ:

β†’ The Perplexity Difference based Preference Curriculum learning (PDPC) framework dynamically arranges the training data based on the model's preference.

β†’ PD is calculated offline using reference models, reducing computational overhead.

β†’ An S-shaped preference function guides the concentration of low-PD data during training, ensuring smooth curriculum progression.

β†’ The training data is arranged offline ensuring continuous training.

Key Insights πŸ’‘:

β†’ Model's perplexity difference (PD) between early and late checkpoints reflects sample difficulty and preference shift during training.

β†’ High-PD data is beneficial in later training stages, while low-PD data suits earlier stages, creating a natural curriculum.

Results πŸ’―:

β†’ 3B model trained with PDPC using 1 trillion tokens achieves an average accuracy increase of 4.1% across benchmarks, and 8.1% on MMLU and CMMLU.

β†’ 1.3B model also shows consistent improvements on all benchmarks.

Discussion about this video

User's avatar