The exponential march towards more efficient language models.
This paper introduces "capability density" as a new metric for evaluating LLM quality across different scales, revealing that model density doubles every 3 months. This discovery, termed the "Densing Law," provides insights into LLM development trends and efficiency improvements.
-----
https://arxiv.org/abs/2412.04315
🤔 Original Problem:
→ Traditional LLM evaluation focuses solely on performance scaling with size, neglecting efficiency and practical deployment constraints.
→ The field lacks quantitative metrics to evaluate LLMs of different scales while considering both effectiveness and computational efficiency.
-----
⚡ Solution in this Paper:
→ The paper introduces "capability density" - defined as the ratio of effective parameter size to actual parameter size.
→ Effective parameter size is calculated using reference models and scaling functions to predict downstream performance.
→ The methodology employs a two-step estimation process: first fitting loss estimation, then performance estimation using sigmoid functions.
-----
🔍 Key Insights:
→ LLM density exhibits exponential growth, doubling approximately every 3.3 months
→ Inference costs decrease exponentially for equivalent performance levels
→ Model compression methods often result in lower density than original models
-----
📊 Results:
→ Maximum density growth rate (A ≈ 0.007) with R² ≈ 0.93
→ Inference costs reduced by 266.7x from January 2023 to present
→ Density growth accelerated by 50% after ChatGPT's release
Share this post