Turning abstract poetry into meaningful images through smart summarization and key element extraction
PoemToPixel introduces a novel framework that transforms poems into visually representative images by capturing their inherent meanings through prompt tuning and diffusion models.
-----
https://arxiv.org/abs/2501.05839v1
🎯 Original Problem:
→ Text-to-image generation struggles with poetry due to its complex, metaphorical nature that goes beyond literal interpretation
→ Current systems fail to capture the emotional depth and abstract meanings in poems
-----
🔧 Solution in this Paper:
→ PoemToPixel employs a two-step approach combining poem summarization and key element extraction
→ Uses GPT-4o mini with human-feedback-based prompt tuning to generate high-quality poem summaries
→ Introduces PoeKey algorithm that extracts emotions, visual elements, and themes from poems
→ Converts extracted elements into precise instructions for diffusion models
→ Creates MiniPo dataset with 1001 children's poems and corresponding images
-----
💡 Key Insights:
→ Two-phase prompt tuning significantly improves both summarization and image generation quality
→ Extracting key elements from poems helps bridge the gap between poetic text and visual representation
→ Human feedback loop in prompt refinement enhances output quality
-----
📊 Results:
→ Achieved 37.54% higher Image-Text Matching scores on MiniPo dataset
→ Outperformed baseline models in summarization with higher ROUGE and METEOR scores
→ Demonstrated superior performance in capturing poem meanings compared to direct poem-to-image generation
Share this post