0:00
/
0:00
Transcript

"MoSLD: An Extremely Parameter-Efficient Mixture-of-Shared LoRAs for Multi-Task Learning"

Generated below podcast on this paper with Google's Illuminate.

Parameter sharing in LoRA experts enables efficient multi-task learning without performance loss.

MoSLD introduces a parameter-sharing mechanism in LoRA for multi-task learning, reducing parameters while maintaining performance across different tasks.

-----

https://arxiv.org/abs/2412.08946

🤔 Original Problem:

LoRA excels at single-task fine-tuning but struggles with multi-task scenarios due to data conflicts and interference. Mixture-of-Experts (MoE) offers a solution but introduces parameter bloat and knowledge forgetting issues.

-----

🔧 Solution in this Paper:

→ MoSLD shares the upper projection matrix (A) among different experts while keeping lower projection matrix (B) task-specific.

→ The shared matrix captures general knowledge across tasks, while individual matrices maintain task-specific features.

→ A dropout strategy on matrix A balances parameter updates and reduces overfitting.

→ The router mechanism selects top-K experts for each input, enabling dynamic task handling.

-----

🎯 Key Insights:

→ Parameter sharing in LoRA can effectively balance general and task-specific knowledge

→ Dropout on shared parameters prevents overfitting and improves information exchange

→ Layer-wise expert allocation improves model efficiency

-----

📊 Results:

→ Reduces trainable parameters to 20.6% of full parameter fine-tuning

→ Outperforms baseline models by 1.56% in mixture settings

→ Shows consistent improvements across model sizes (7B, 13B, 33B)

Discussion about this video