Normalizing Flows can match diffusion models using Transformers and smart noise handling.
TARFlow introduces a powerful Transformer-based architecture for Normalizing Flows that achieves state-of-the-art image generation quality comparable to diffusion models.
-----
https://arxiv.org/abs/2412.06329
🤔 Original Problem:
→ Normalizing Flows (NFs) showed early promise but fell behind other generative models like diffusion models in recent years, raising questions about their fundamental limitations.
-----
🔧 Solution in this Paper:
→ TARFlow reimagines Normalizing Flows using a stack of autoregressive Transformer blocks that process image patches.
→ The architecture alternates autoregression direction between layers for better modeling.
→ It introduces Gaussian noise during training instead of traditional uniform noise.
→ A novel post-training denoising procedure cleans up generated samples.
→ The model implements both conditional and unconditional guidance similar to diffusion models.
-----
💡 Key Insights:
→ Simple Transformer-based architecture can unlock NF's full potential
→ Gaussian noise augmentation is critical for high-quality generation
→ Score-based denoising significantly improves sample quality
→ Guidance techniques from diffusion models work well with NFs
-----
📊 Results:
→ First sub-3 BPD (2.99) on ImageNet 64x64 likelihood estimation
→ FID score of 2.90 on conditional ImageNet 64x64, competitive with GANs
→ Scales effectively to 256x256 resolution on AFHQ dataset
Share this post