"OmniHuman-1: Rethinking the Scaling-Up of One-Stage Conditioned Human Animation Models"
Below podcast on this paper is generated with Google's Illuminate.
This paper behind OmniHuman, the brand new Diffusion Transformer model for human animation.
Achieving incredibly realistic and flexible human video generation.
https://arxiv.org/abs/2502.01061
📌 OmniHuman's core innovation lies in its omni-conditions training. It cleverly reuses data typically discarded in single-condition training. This unlocks scaling benefits for human animation models, similar to what large datasets achieved for general video generation.
📌 The progressive multi-stage training and dynamic training ratio are key. OmniHuman prioritizes weaker conditions during training. This prevents stronger conditions like pose from dominating, ensuring effective learning across all modalities.
📌 Reference image conditioning via DiT backbone reuse is efficient. It avoids parameter duplication unlike prior methods. This parameter efficiency and the integration of diverse conditions are vital for scalable human animation.