Safety guardrails for AI art generation.
PromptGuard introduces a safety mechanism for Text-to-Image models using optimized soft prompts, preventing unsafe content generation while maintaining image quality and generation speed.
https://arxiv.org/abs/2501.03544
🚨 Original Problem:
Text-to-Image models can generate harmful content like sexual, violent, political and disturbing images. Current solutions either modify model parameters or use external filters, both causing performance overhead and quality degradation.
🛠️ Solution in this Paper:
→ Optimizes a safety soft prompt that acts like a system prompt within the model's text embedding space
→ Uses a divide-and-conquer approach by training separate prompts for different unsafe content categories
→ Leverages contrastive learning to balance between unsafe content removal and benign image quality
→ Combines SDEdit to transform unsafe images into safer versions for training data
💡 Key Insights:
→ System prompts from LLMs can be adapted for Text-to-Image safety
→ Soft prompts can guide image generation without model modifications
→ Divide-and-conquer approach handles diverse unsafe content better
→ Contrastive learning maintains benign generation quality
📊 Results:
→ Achieves 5.84% unsafe ratio, lowest among all tested methods
→ Processes images 7.8x faster than prior content moderation approaches
→ Maintains high CLIP scores showing good prompt-image alignment
→ Successfully moderates across sexual, violent, political and disturbing categories
Share this post