0:00
/
0:00
Transcript

"FasterCache: Training-Free Video Diffusion Model Acceleration with High Quality"

The podcast on this paper is generated with Google's Illuminate.

FasterCache accelerates video generation by intelligently reusing features while preserving quality.

Smart feature caching with Training-free acceleration that makes video diffusion models twice as fast.

📚 https://arxiv.org/abs/2410.19355

🎯 Original Problem:

Video diffusion models are computationally expensive and slow, taking 2-5 minutes to generate a 6-second 480P video. Current cache-based acceleration methods degrade video quality by directly reusing features between timesteps.

-----

🔍 Key Insights:

• Direct feature reuse between adjacent timesteps loses subtle variations crucial for quality

• High redundancy exists between conditional/unconditional features within same timestep

• Classifier-free guidance (CFG) doubles inference time due to extra unconditional computations

• Feature differences between conditional/unconditional outputs evolve from low to high frequencies during sampling

-----

⚡ Solution in this Paper:

• Dynamic Feature Reuse Strategy:

- Computes attention outputs at alternate timesteps

- Uses weighted feature differences to preserve temporal continuity

- Adapts feature reuse based on sampling progress

• CFG-Cache:

- Stores residuals between conditional/unconditional outputs

- Separately handles high/low frequency components

- Dynamically adjusts frequency emphasis based on sampling phase

-----

📊 Results:

• Achieves 1.67x speedup on Vchitect-2.0 while maintaining quality (VBench: 80.80% → 80.84%)

• Successfully applied to multiple models: Open-Sora 1.2, Open-Sora-Plan, Latte, CogVideoX

• Scales effectively with multiple GPUs and different video resolutions

Discussion about this video