PPD (PPD (Personalized Preference Fine-tuning of Diffusion Models) personalizes image generation by learning individual preferences from few-shot examples.) personalizes text-to-image diffusion models by conditioning on user embeddings learned from few-shot preference examples.
-----
https://arxiv.org/abs/2501.06655
Original Problem 🤔:
→ Current text-to-image models align with population-level preferences, neglecting individual user preferences.
→ Personalized alignment is challenging due to limited individual user data and the difficulty of expressing preferences via text or single images.
-----
Solution in this Paper 💡:
→ PPD leverages a Vision-Language Model (VLM) to extract user preference embeddings from few-shot pairwise preference examples.
→ These embeddings are then incorporated into a diffusion model (Stable Cascade) through cross-attention layers.
→ The model is fine-tuned with a multi-reward Direct Preference Optimization (DPO) objective, aligning it with diverse user preferences simultaneously.
-----
Key Insights from this Paper 😲:
→ Few-shot preference examples effectively represent individual reward functions.
→ Conditioning diffusion models on user embeddings enables personalized generation.
→ Multi-reward optimization allows a single model to learn diverse preferences and generalize to new users.
-----
Results ✨:
→ Achieves an average win rate of 76% over Stable Cascade in real-world user scenarios with as few as four preference examples.
→ Demonstrates effective optimization for multiple rewards (CLIP, Aesthetic, HPS) and smooth interpolation between them.
→ A user classifier trained on the VLM embeddings achieves 90% top-16 accuracy on 300 users.
Share this post