FasterCache accelerates video generation by intelligently reusing features while preserving quality.
Smart feature caching with Training-free acceleration that makes video diffusion models twice as fast.
📚 https://arxiv.org/abs/2410.19355
🎯 Original Problem:
Video diffusion models are computationally expensive and slow, taking 2-5 minutes to generate a 6-second 480P video. Current cache-based acceleration methods degrade video quality by directly reusing features between timesteps.
-----
🔍 Key Insights:
• Direct feature reuse between adjacent timesteps loses subtle variations crucial for quality
• High redundancy exists between conditional/unconditional features within same timestep
• Classifier-free guidance (CFG) doubles inference time due to extra unconditional computations
• Feature differences between conditional/unconditional outputs evolve from low to high frequencies during sampling
-----
⚡ Solution in this Paper:
• Dynamic Feature Reuse Strategy:
- Computes attention outputs at alternate timesteps
- Uses weighted feature differences to preserve temporal continuity
- Adapts feature reuse based on sampling progress
• CFG-Cache:
- Stores residuals between conditional/unconditional outputs
- Separately handles high/low frequency components
- Dynamically adjusts frequency emphasis based on sampling phase
-----
📊 Results:
• Achieves 1.67x speedup on Vchitect-2.0 while maintaining quality (VBench: 80.80% → 80.84%)
• Successfully applied to multiple models: Open-Sora 1.2, Open-Sora-Plan, Latte, CogVideoX
• Scales effectively with multiple GPUs and different video resolutions
Share this post