This paper introduces Guidance-Free Training (GFT). GFT enables visual models to match Classifier-Free Guidance (CFG) performance, while halving sampling costs by removing the need for guided sampling.
-----
📌 GFT replaces Classifier-Free Guidance by embedding conditionality directly into training. This eliminates the need for separate guided sampling, reducing inference cost by 50% while maintaining performance. The key is linear interpolation with a pseudo-temperature parameter.
📌 GFT reframes guidance as a learnable interpolation between unconditional and conditional models. This avoids redundant sampling while preserving the flexibility of Classifier-Free Guidance. Gradient stopping on the unconditional branch ensures training efficiency.
📌 The pseudo-temperature parameter beta acts as an implicit control knob for diversity-fidelity trade-offs. This removes external guidance dependencies, enabling a single model to achieve Classifier-Free Guidance quality with a simpler, cheaper inference pipeline.
-----
https://arxiv.org/abs/2501.15420
Original Problem 😫:
→ Classifier-Free Guidance (CFG) is effective for high-quality image generation.
→ CFG requires both conditional and unconditional models during sampling.
→ This doubles the computational cost during inference.
→ CFG also complicates post-training techniques like distillation and RLHF.
-----
Solution in this Paper 💡:
→ This paper proposes Guidance-Free Training (GFT).
→ GFT trains a single model for temperature-controlled sampling.
→ GFT parameterizes the conditional model implicitly.
→ It uses a linear interpolation between a sampling model and an unconditional model.
→ The training objective remains the same as CFG's maximum likelihood objective.
→ GFT introduces a pseudo-temperature parameter, beta, as model input.
→ The loss function is: E [ || beta * epsilon_theta^s(x_t|c, beta) + (1-beta) * epsilon_theta^u(x_t) - epsilon ||_2^2 ].
→ During training, beta is randomly sampled from 0 to 1.
→ GFT stops gradients for the unconditional model for efficiency and stability.
-----
Key Insights from this Paper 🧠:
→ A single model can achieve CFG-level performance by using a novel training parameterization.
→ Implicit conditional model construction via linear interpolation works effectively.
→ Introducing a pseudo-temperature parameter allows for diversity-fidelity trade-off in a guidance-free manner.
→ Stopping gradients for the unconditional branch during training improves efficiency without compromising performance.
→ GFT simplifies the visual generation pipeline by removing the need for dual model inference at sampling time.
-----
Results 📊:
→ GFT achieves a guidance-free FID of 1.99 on DiT-XL, comparable to CFG's 2.11.
→ GFT fine-tuning achieves nearly lossless FID within 5% of pre-training epochs.
→ GFT reduces sampling cost by 50% compared to CFG.
→ GFT training adds only 10-20% computation overhead compared to CFG.
Share this post