0:00
/
0:00
Transcript

"Normalizing Flows are Capable Generative Models"

The podcast on this paper is generated with Google's Illuminate.

Normalizing Flows can match diffusion models using Transformers and smart noise handling.

TARFlow introduces a powerful Transformer-based architecture for Normalizing Flows that achieves state-of-the-art image generation quality comparable to diffusion models.

-----

https://arxiv.org/abs/2412.06329

🤔 Original Problem:

→ Normalizing Flows (NFs) showed early promise but fell behind other generative models like diffusion models in recent years, raising questions about their fundamental limitations.

-----

🔧 Solution in this Paper:

→ TARFlow reimagines Normalizing Flows using a stack of autoregressive Transformer blocks that process image patches.

→ The architecture alternates autoregression direction between layers for better modeling.

→ It introduces Gaussian noise during training instead of traditional uniform noise.

→ A novel post-training denoising procedure cleans up generated samples.

→ The model implements both conditional and unconditional guidance similar to diffusion models.

-----

💡 Key Insights:

→ Simple Transformer-based architecture can unlock NF's full potential

→ Gaussian noise augmentation is critical for high-quality generation

→ Score-based denoising significantly improves sample quality

→ Guidance techniques from diffusion models work well with NFs

-----

📊 Results:

→ First sub-3 BPD (2.99) on ImageNet 64x64 likelihood estimation

→ FID score of 2.90 on conditional ImageNet 64x64, competitive with GANs

→ Scales effectively to 256x256 resolution on AFHQ dataset

Discussion about this video

User's avatar