"Improved Training Technique for Latent Consistency Models"

Below podcast on this paper is generated with Google's Illuminate.

Rohan Paul

Feb 11, 2025

Article voiceover

0:00

-4:28

https://arxiv.org/abs/2502.01441

Problem: Consistency Training, effective in pixel space, suffers performance drops in latent space, crucial for scaling LLMs to complex tasks like image/video generation.

This paper proposes enhancements to latent consistency training to bridge the performance gap with latent diffusion models.

-----

📌 Cauchy loss replaces Pseudo-Huber, effectively managing latent space outliers. This directly stabilizes consistency training, yielding usable one/two-step image generation.

📌 Adaptive scaling and Non-scaling LayerNorm are vital architectural choices. They dynamically control robustness and improve feature normalization amidst latent noise.

📌 Optimal Transport integration minimizes training variance. This methodological addition enhances consistency training stability and overall sample quality in latent space.

----------

Methods Explored in this Paper 🔧:

→ The paper identifies that latent data contains impulsive outliers, degrading performance of standard Consistency Training methods.

→ To address this, it replaces Pseudo-Huber loss with Cauchy loss, which is less sensitive to extreme outliers.

→ Diffusion loss is introduced at early timesteps to regularize the consistency objective during initial training phases.

→ Optimal Transport (OT) coupling is employed to reduce training variance and improve stability.

→ An adaptive scaling-$c$ scheduler dynamically adjusts the robustness of the loss function.

→ Non-scaling LayerNorm is integrated into the model architecture to better capture feature statistics while minimizing outlier influence.

-----

Key Insights 💡:

→ Impulsive outliers in latent data are a primary cause of poor performance in latent Consistency Training.

→ Cauchy loss effectively mitigates the impact of these outliers compared to Pseudo-Huber loss.

→ Combining diffusion loss at early stages with consistency loss enhances training.

→ Optimal Transport (OT) improves training stability by reducing variance.

→ Non-scaling LayerNorm is beneficial for robust feature normalization in latent space.

-----

Results 📊:

→ Achieves FID of 7.27 on CelebA-HQ with 1-NFE sampling, significantly lower than iLCT's 37.15.

→ Reaches FID of 8.87 on LSUN Church and 8.72 on FFHQ datasets with 1-NFE sampling, again outperforming iLCT substantially.

→ Demonstrates improved Recall metric, reaching 0.50 on CelebA-HQ, indicating better sample diversity.

Rohan's Bytes

Discussion about this post