ZipNN, proposed in this paper, shrinks AI models by 33%-50% without losing any information.
🛠️ The key innovation of ZipNN's compression approach 👇
It separates and compresses the exponent bits from floating point parameters since they show a highly skewed distribution. It uses Huffman encoding instead of Lempel-Ziv algorithms, improving both compression ratio and speed.
https://arxiv.org/abs/2411.05239
🎯 Original Problem:
AI models are straining infrastructure with massive sizes. Mistral alone needs 40 PetaBytes monthly data transfer from Hugging Face. Traditional compression methods struggle with model parameters.
-----
🔧 Solution in this Paper:
→ ZipNN separates and compresses exponent bits from floating point parameters due to their highly skewed distribution
→ Uses Huffman encoding instead of Lempel-Ziv algorithms, improving both compression ratio and speed
→ Identifies model categories (Regular vs Clean) and applies specialized compression strategies
→ For clean models that underwent rounding, applies byte grouping to maximize compression
-----
💡 Key Insights:
→ Model parameters typically stay within [-1,+1] range during training, making exponent bits highly compressible
→ Out of 256 possible exponent values, only about 40 actually appear
→ Top 12 exponent values account for 99.9% of all parameters
→ Fraction bits show high entropy except in "clean" models
-----
📊 Results:
→ For BF16 models: 33% space savings through exponent reduction
→ For "clean" models: Up to 55% space savings
→ 17% better compression ratio than Zstd
→ 62% faster compression/decompression speeds
→ Could save over an ExaByte monthly from model hubs
Share this post