0:00
/
0:00
Transcript

"The Scaling Law for LoRA Base on Mutual Information Upper Bound"

Generated below podcast on this paper with Google's Illuminate.

Mutual Information Upper Bound (MIUB) measures how much new knowledge LoRA actually learns versus just copying from the base model.

This paper introduces MIUB (Mutual Information Upper Bound) to evaluate LoRA fine-tuning effectiveness by measuring dependency between frozen LLM knowledge and new LoRA-learned knowledge.

-----

https://arxiv.org/abs/2501.03152

🤔 Original Problem:

→ Traditional metrics like cross-entropy and perplexity fail to capture the relationship between pre-trained LLM knowledge and new knowledge learned through LoRA fine-tuning

→ No systematic way exists to measure scaling laws in LoRA fine-tuning, leading to high computational costs

-----

🔬 Solution in this Paper:

→ Introduces MIUB to measure dependency between frozen LLM and LoRA-learned knowledge

→ Adds LoRA structures to Attention and FFN layers while freezing base model parameters

→ Calculates Jensen-Shannon divergence between probability distributions of frozen and LoRA components

→ Proposes scaling laws for model size, LoRA rank, and dataset complexity

-----

💡 Key Insights:

→ Lower MIUB indicates better generalization and less dependency on base model

→ MIUB decreases as model size increases

→ MIUB decreases as LoRA rank increases

→ MIUB decreases with larger/more complex datasets

-----

📊 Results:

→ Tested on LLaMA3-8B and Phi3-3B across 7 benchmark datasets

→ MIUB showed 17% decrease with model size increase

→ More stable than CE which fluctuated 571x with similar model size changes

→ Consistently aligned with actual model performance (ACC)

------

Are you into AI and LLMs❓ Join my daily AI newsletter. I will send you 7 emails a week analyzing the highest signal AI developments. ↓↓

🎉 https://rohanpaul.substack.com/

Discussion about this video