Peek into LLM's future: Predict task prowess before full training.
This paper proposes a method to predict LLM task performance using only 1% of training compute. It develops task scaling laws and model ladders to accurately forecast individual task accuracy for overtrained models.
-----
https://arxiv.org/abs/2412.04403
🤔 Original Problem:
Predicting LLM task performance before full training is challenging, especially for individual tasks in overtrained models.
-----
🔬 Solution in this Paper:
→ The paper introduces a two-step prediction approach using small-scale "ladder" models.
→ Step 1 uses model size and training tokens to predict task-specific loss.
→ Step 2 uses the predicted task loss to estimate task accuracy.
→ Ladder models span sizes from 190M to 1.3B parameters, trained on 1x to 10x Chinchilla-optimal data.
→ The method fits parameterized functions for each step using data from ladder models.
→ Predictions are made for 7B and 13B target models on multiple-choice tasks in ranked classification format.
-----
💡 Key Insights from this Paper:
→ Task-specific loss is a better intermediate feature than general language modeling loss
→ Two-step approach outperforms single-step prediction for most tasks
→ Prediction accuracy correlates with task variance in ladder models
→ Increasing model size in ladder improves predictions more than extending training
-----
📊 Results:
→ Predicts accuracy within 2 points for 4 out of 8 tasks
→ Average absolute error of 3.8 points for 7B and 4.2 points for 13B models
→ Uses only 1% of compute required to train target models
Share this post