0:00
/
0:00
Transcript

"SWSC: Shared Weight for Similar Channel in LLM"

Generated below podcast on this paper with Google's Illuminate.

Clustering similar weight channels lets LLMs share parameters, making them slim without getting dumb.

SWSC (SWSC: Shared Weight for Similar Channel in LLM) introduces a novel compression method that clusters similar weight channels in LLMs and uses representative vectors with error compensation, achieving 90% compression while maintaining model performance .

-----

https://arxiv.org/abs/2501.08631

🤔 Original Problem:

LLMs face deployment challenges due to massive parameter counts, creating storage and computing burdens that limit their practical applications .

-----

🔧 Solution in this Paper:

→ SWSC uses K-Means clustering to group similar weight channels, selecting representative vectors for each cluster

→ The method stores only cluster labels and representative vectors, dramatically reducing parameter count

→ To prevent performance degradation, it employs singular value decomposition on weight error values

→ The error compensation retains larger singular values and vectors to maintain accuracy during inference

-----

💡 Key Insights:

→ Similar channels in LLM weights can share parameters without significant performance loss

→ Error compensation through SVD effectively preserves model accuracy

→ The method works particularly well with Query and Key projectors in attention layers

-----

📊 Results:

→ Achieves 90% compression on 4096×4096 matrices using 256 clusters

→ Maintains stable perplexity (6.547) at 3-bit compression compared to RTN (20.550)

→ Outperforms RTN quantization at 2-bit compression (7.297 vs 4958.396)

Discussion about this video

User's avatar