0:00
/
0:00
Transcript

"DiffSplat: Repurposing Image Diffusion Models for Scalable Gaussian Splat Generation"

Below podcast is generated with Google's Illuminate.

DiffSplat efficiently generates 3D Gaussian Splats by repurposing 2D image diffusion models. It overcomes limitations of prior 3D generative methods by leveraging large 2D datasets and ensuring 3D consistency.

-----

Paper - https://arxiv.org/abs/2501.16764

Original Problem 🙁:

→ Existing 3D generation methods struggle with limited 3D training data.

→ Methods using 2D supervision lack access to large 2D pre-trained models.

→ Two-stage methods generating multi-view images then reconstructing 3D suffer from inconsistencies.

-----

Solution in this Paper 💡:

→ DiffSplat framework directly generates 3D Gaussian Splats.

→ It fine-tunes image diffusion models for 3D generation.

→ A lightweight model reconstructs multi-view Gaussian Splat grids rapidly for training data.

→ Image Variational Autoencoders are fine-tuned to encode splat grids into 'splat latents'.

→ A diffusion loss and a 3D rendering loss are used during training.

→ Rendering loss ensures 3D consistency across different views.

-----

Key Insights from this Paper 🤔:

→ Image diffusion models possess implicit 3D geometry understanding.

→ Gaussian Splat grids can be treated as a special image style for diffusion models.

→ Fine-tuning image diffusion models with splat latents and rendering loss enables direct 3DGS generation.

→ Leveraging 2D priors enhances 3D generation quality and consistency.

-----

Results 💪:

→ DiffSplat achieves superior performance in text-to-3D generation on T3Bench, outperforming GVGEN, LN3Diff, DIRECT-3D, and 3DTopia in CLIP similarity, CLIP R-Precision, and ImageReward.

→ In image-to-3D generation on GSO dataset, DiffSplat outperforms 3DTopia-XL, LN3Diff, LGM, GRM, LaRa, CRM and InstantMesh in PSNR, SSIM, and LPIPS metrics.

→ Ablation studies validate the effectiveness of rendering loss and geometric guidance.

-----

Discussion about this video