"Unpacking SDXL Turbo: Interpreting Text-to-Image Models with Sparse Autoencoders"

Playback speed

Share post at current time

Share from 0:00

0:00

Transcript

"Unpacking SDXL Turbo: Interpreting Text-to-Image Models with Sparse Autoencoders"

The podcast on this paper is generated with Google's Illuminate.

Rohan Paul

Jan 05, 2025

Transcript

Sparse autoencoders (SAEs) crack open SDXL Turbo's black box by revealing how different parts control image generation

Discover how SDXL Turbo's neural blocks collaborate to turn text into stunning images

📚 https://arxiv.org/abs/2410.22366

🤖 Original Problem:

Text-to-image models like SDXL Turbo are black boxes - we don't understand how they work internally. While Sparse Autoencoders (SAEs) help interpret LLMs by breaking down their internal representations, similar analysis tools don't exist for image generation models.

-----

🔍 Solution in this Paper:

→ Applied SAEs to analyze 4 key transformer blocks in SDXL Turbo's denoising U-net

→ Created SDLens library to capture and manipulate intermediate results during image generation

→ Trained SAEs on 1.5M LAION-COCO prompts to decompose feature maps into interpretable components

→ Developed visualization techniques: spatial activation heatmaps, feature modulation, empty context activation

→ Built automated feature annotation pipeline using GPT-4V

-----

💡 Key Insights:

→ Different transformer blocks have specialized roles:

- down.2.1: Controls overall image composition

- up.0.0: Adds fine-grained local details

- up.0.1: Manages color, illumination and style

- mid.0: Handles spatial relationships

→ Features learned are highly interpretable and causally influence generation

→ Features show high specificity (0.71 for down.2.1) compared to random baseline (0.50)

→ up.0.1 features have strongest texture effects (0.20 score vs 0.18 baseline)

-----

📊 Results:

→ Feature specificity scores significantly higher than random baseline across all blocks

→ Causality analysis shows strong feature effects (0.19 CLIP similarity vs 0.21 ground truth)

→ Local intervention tests confirm specialized roles of different blocks

→ Color sensitivity analysis validates style/color specialization of up.0.1 block

Rohan's Bytes

"Unpacking SDXL Turbo: Interpreting Text-to-Image Models with Sparse Autoencoders"

Discussion about this video