"The Scaling Law for LoRA Base on Mutual Information Upper Bound"

Playback speed

Share post at current time

0:00

Transcript

"The Scaling Law for LoRA Base on Mutual Information Upper Bound"

Generated below podcast on this paper with Google's Illuminate.

Rohan Paul

Jan 22, 2025

Mutual Information Upper Bound (MIUB) measures how much new knowledge LoRA actually learns versus just copying from the base model.

This paper introduces MIUB (Mutual Information Upper Bound) to evaluate LoRA fine-tuning effectiveness by measuring dependency between frozen LLM knowledge and new LoRA-learned knowledge.

-----

https://arxiv.org/abs/2501.03152

🤔 Original Problem:

→ Traditional metrics like cross-entropy and perplexity fail to capture the relationship between pre-trained LLM knowledge and new knowledge learned through LoRA fine-tuning

→ No systematic way exists to measure scaling laws in LoRA fine-tuning, leading to high computational costs

-----

🔬 Solution in this Paper:

→ Introduces MIUB to measure dependency between frozen LLM and LoRA-learned knowledge

→ Adds LoRA structures to Attention and FFN layers while freezing base model parameters

→ Calculates Jensen-Shannon divergence between probability distributions of frozen and LoRA components

→ Proposes scaling laws for model size, LoRA rank, and dataset complexity

-----

💡 Key Insights:

→ Lower MIUB indicates better generalization and less dependency on base model

→ MIUB decreases as model size increases

→ MIUB decreases as LoRA rank increases