Turn words into slides: AutoPresent bridges the gap between thoughts and visuals
AutoPresent enables automated slide creation from natural language instructions by generating executable code that produces high-quality presentation slides, comparable to GPT-4.
-----
https://arxiv.org/abs/2501.00912
🎯 Original Problem:
Creating presentation slides requires both content creation and visual design skills, making it time-consuming even for experts. Current AI solutions excel at general image generation but struggle with structured visual content like slides.
-----
🔧 Solution in this Paper:
→ Introduces SlidesBench, a benchmark with 7k training and 585 test examples across 10 domains for evaluating slide generation
→ Proposes program generation over direct image generation, where models produce Python code that creates slides
→ Develops AutoPresent, an 8B parameter LLM trained on instruction-code pairs
→ Creates SlidesLib, a toolkit with high-level functions to simplify slide program generation
→ Implements iterative refinement where models self-improve their output
-----
🔍 Key Insights:
→ Program generation produces better slides than end-to-end image generation
→ Small models struggle with direct code generation but improve with SlidesLib
→ Iterative refinement enhances slide quality across all scenarios
→ Human-designed slides still outperform AI-generated ones
-----
📊 Results:
→ AutoPresent achieves 84.1% execution success rate
→ Matches GPT-4's performance with 55.0 overall score
→ SlidesLib reduces average program length from 170 to 13 lines
Share this post