DiffSplat efficiently generates 3D Gaussian Splats by repurposing 2D image diffusion models. It overcomes limitations of prior 3D generative methods by leveraging large 2D datasets and ensuring 3D consistency.
-----
Paper - https://arxiv.org/abs/2501.16764
Original Problem 🙁:
→ Existing 3D generation methods struggle with limited 3D training data.
→ Methods using 2D supervision lack access to large 2D pre-trained models.
→ Two-stage methods generating multi-view images then reconstructing 3D suffer from inconsistencies.
-----
Solution in this Paper 💡:
→ DiffSplat framework directly generates 3D Gaussian Splats.
→ It fine-tunes image diffusion models for 3D generation.
→ A lightweight model reconstructs multi-view Gaussian Splat grids rapidly for training data.
→ Image Variational Autoencoders are fine-tuned to encode splat grids into 'splat latents'.
→ A diffusion loss and a 3D rendering loss are used during training.
→ Rendering loss ensures 3D consistency across different views.
-----
Key Insights from this Paper 🤔:
→ Image diffusion models possess implicit 3D geometry understanding.
→ Gaussian Splat grids can be treated as a special image style for diffusion models.
→ Fine-tuning image diffusion models with splat latents and rendering loss enables direct 3DGS generation.
→ Leveraging 2D priors enhances 3D generation quality and consistency.
-----
Results 💪:
→ DiffSplat achieves superior performance in text-to-3D generation on T3Bench, outperforming GVGEN, LN3Diff, DIRECT-3D, and 3DTopia in CLIP similarity, CLIP R-Precision, and ImageReward.
→ In image-to-3D generation on GSO dataset, DiffSplat outperforms 3DTopia-XL, LN3Diff, LGM, GRM, LaRa, CRM and InstantMesh in PSNR, SSIM, and LPIPS metrics.
→ Ablation studies validate the effectiveness of rendering loss and geometric guidance.
-----
Share this post