0:00
/
0:00
Transcript

"AutoPresent: Designing Structured Visuals from Scratch"

Generated below podcast on this paper with Google's Illuminate.

Turn words into slides: AutoPresent bridges the gap between thoughts and visuals

AutoPresent enables automated slide creation from natural language instructions by generating executable code that produces high-quality presentation slides, comparable to GPT-4.

-----

https://arxiv.org/abs/2501.00912

🎯 Original Problem:

Creating presentation slides requires both content creation and visual design skills, making it time-consuming even for experts. Current AI solutions excel at general image generation but struggle with structured visual content like slides.

-----

🔧 Solution in this Paper:

→ Introduces SlidesBench, a benchmark with 7k training and 585 test examples across 10 domains for evaluating slide generation

→ Proposes program generation over direct image generation, where models produce Python code that creates slides

→ Develops AutoPresent, an 8B parameter LLM trained on instruction-code pairs

→ Creates SlidesLib, a toolkit with high-level functions to simplify slide program generation

→ Implements iterative refinement where models self-improve their output

-----

🔍 Key Insights:

→ Program generation produces better slides than end-to-end image generation

→ Small models struggle with direct code generation but improve with SlidesLib

→ Iterative refinement enhances slide quality across all scenarios

→ Human-designed slides still outperform AI-generated ones

-----

📊 Results:

→ AutoPresent achieves 84.1% execution success rate

→ Matches GPT-4's performance with 55.0 overall score

→ SlidesLib reduces average program length from 170 to 13 lines

Discussion about this video

User's avatar