0:00
/
0:00
Transcript

"Personalized Preference Fine-tuning of Diffusion Models"

Generated below podcast on this paper with Google's Illuminate.

PPD (PPD (Personalized Preference Fine-tuning of Diffusion Models) personalizes image generation by learning individual preferences from few-shot examples.) personalizes text-to-image diffusion models by conditioning on user embeddings learned from few-shot preference examples.

-----

https://arxiv.org/abs/2501.06655

Original Problem 🤔:

→ Current text-to-image models align with population-level preferences, neglecting individual user preferences.

→ Personalized alignment is challenging due to limited individual user data and the difficulty of expressing preferences via text or single images.

-----

Solution in this Paper 💡:

→ PPD leverages a Vision-Language Model (VLM) to extract user preference embeddings from few-shot pairwise preference examples.

→ These embeddings are then incorporated into a diffusion model (Stable Cascade) through cross-attention layers.

→ The model is fine-tuned with a multi-reward Direct Preference Optimization (DPO) objective, aligning it with diverse user preferences simultaneously.

-----

Key Insights from this Paper 😲:

→ Few-shot preference examples effectively represent individual reward functions.

→ Conditioning diffusion models on user embeddings enables personalized generation.

→ Multi-reward optimization allows a single model to learn diverse preferences and generalize to new users.

-----

Results ✨:

→ Achieves an average win rate of 76% over Stable Cascade in real-world user scenarios with as few as four preference examples.

→ Demonstrates effective optimization for multiple rewards (CLIP, Aesthetic, HPS) and smooth interpolation between them.

→ A user classifier trained on the VLM embeddings achieves 90% top-16 accuracy on 300 users.

Discussion about this video