Bits beat pixels: A new way to teach AI about images
Infinity turns every image into bits, making AI art generation faster and better than ever
Infinity introduces a bitwise visual autoregressive model that transforms high-resolution image generation by using infinite vocabulary tokenization and self-correction mechanisms.
-----
https://arxiv.org/abs/2412.04431
🤖 Original Problem:
Current autoregressive models struggle with high-resolution image generation due to limited vocabulary size, poor reconstruction quality, and train-test discrepancy issues.
-----
🔧 Solution in this Paper:
→ Introduces bitwise modeling framework with three key components: bitwise multi-scale residual quantizer, infinite-vocabulary classifier, and bitwise self-correction
→ Scales tokenizer vocabulary to 2^64 while reducing memory by 99.95% through dimension-independent bitwise quantization
→ Implements parallel binary classifiers instead of conventional classifier to handle extremely large vocabulary
→ Uses random bit flipping and re-quantization for self-correction during training to address prediction errors
-----
💡 Key Insights:
→ Bitwise tokenization enables nearly infinite vocabulary while maintaining low memory usage
→ Parallel binary classification is more efficient than conventional methods for large vocabularies
→ Self-correction mechanism significantly reduces train-test discrepancy
→ Progressive training strategy improves generation quality across resolutions
-----
📊 Results:
→ Generates 1024×1024 images 2.6× faster than SD3-Medium (0.8s vs 2.1s)
→ Improves GenEval score from 0.62 to 0.73 compared to SD3-Medium
→ Achieves 66% win rate in human evaluation
→ Reduces memory usage by 99.95% compared to conventional classifiers
------
Are you into AI and LLMs❓ Join me on X/Twitter with 52K+ others, to remain on the bleeding-edge of AI every day.
𝕏/🐦 https://x.com/rohanpaul_ai
Share this post