0:00
/
0:00
Transcript

"Can We Generate Images with CoT? Let's Verify and Reinforce Image Generation Step by Step"

Below podcast is generated with Google's Illuminate.

Chain-of-thought reasoning enhances autoregressive image generation with specialized reward models.

This paper explores test-time verification, preference alignment, and reward model design.

-----

Paper - https://arxiv.org/abs/2501.13926

Original Problem: 🤔:

→ Current autoregressive image generation models struggle with inconsistent decoding paths and blurry intermediate images.

→ Existing reward models (Outcome Reward Model (ORM) and Process Reward Model (PRM)) are not well-suited for evaluating autoregressive image generation.

-----

Solution in this Paper💡:

→ The paper investigates applying CoT reasoning strategies to autoregressive image generation. This includes scaling test-time computation with reward models and aligning model preferences using Direct Preference Optimization (DPO).

→ A novel Potential Assessment Reward Model (PARM) is introduced. PARM assesses each generation step by judging clarity, evaluating potential, and selecting the best among high-potential paths.

→ An enhanced version, PARM++, adds a reflection mechanism for self-correction of generated images.

-----

Key Insights from this Paper: 🤯:

→ CoT reasoning strategies can be adapted for autoregressive image generation.

→ Test-time verification and preference alignment can significantly improve generation quality.

→ Specialized reward models like PARM and PARM++ are crucial for effective CoT reasoning in image generation.

-----

Results:💯:

→ Show-o, enhanced with the investigated strategies, improves by +24% on GenEval.

→ This surpasses Stable Diffusion 3 by +15%.

→ PARM++ achieves a +4% improvement over PARM on the baseline model.

Discussion about this video