Clustering similar weight channels lets LLMs share parameters, making them slim without getting dumb.
SWSC (SWSC: Shared Weight for Similar Channel in LLM) introduces a novel compression method that clusters similar weight channels in LLMs and uses representative vectors with error compensation, achieving 90% compression while maintaining model performance .
-----
https://arxiv.org/abs/2501.08631
🤔 Original Problem:
LLMs face deployment challenges due to massive parameter counts, creating storage and computing burdens that limit their practical applications .
-----
🔧 Solution in this Paper:
→ SWSC uses K-Means clustering to group similar weight channels, selecting representative vectors for each cluster
→ The method stores only cluster labels and representative vectors, dramatically reducing parameter count
→ To prevent performance degradation, it employs singular value decomposition on weight error values
→ The error compensation retains larger singular values and vectors to maintain accuracy during inference
-----
💡 Key Insights:
→ Similar channels in LLM weights can share parameters without significant performance loss
→ Error compensation through SVD effectively preserves model accuracy
→ The method works particularly well with Query and Key projectors in attention layers
-----
📊 Results:
→ Achieves 90% compression on 4096×4096 matrices using 256 clusters
→ Maintains stable perplexity (6.547) at 3-bit compression compared to RTN (20.550)
→ Outperforms RTN quantization at 2-bit compression (7.297 vs 4958.396)
Share this post