Chain-of-thought reasoning enhances autoregressive image generation with specialized reward models.
This paper explores test-time verification, preference alignment, and reward model design.
-----
Paper - https://arxiv.org/abs/2501.13926
Original Problem: 🤔:
→ Current autoregressive image generation models struggle with inconsistent decoding paths and blurry intermediate images.
→ Existing reward models (Outcome Reward Model (ORM) and Process Reward Model (PRM)) are not well-suited for evaluating autoregressive image generation.
-----
Solution in this Paper💡:
→ The paper investigates applying CoT reasoning strategies to autoregressive image generation. This includes scaling test-time computation with reward models and aligning model preferences using Direct Preference Optimization (DPO).
→ A novel Potential Assessment Reward Model (PARM) is introduced. PARM assesses each generation step by judging clarity, evaluating potential, and selecting the best among high-potential paths.
→ An enhanced version, PARM++, adds a reflection mechanism for self-correction of generated images.
-----
Key Insights from this Paper: 🤯:
→ CoT reasoning strategies can be adapted for autoregressive image generation.
→ Test-time verification and preference alignment can significantly improve generation quality.
→ Specialized reward models like PARM and PARM++ are crucial for effective CoT reasoning in image generation.
-----
Results:💯:
→ Show-o, enhanced with the investigated strategies, improves by +24% on GenEval.
→ This surpasses Stable Diffusion 3 by +15%.
→ PARM++ achieves a +4% improvement over PARM on the baseline model.
Share this post