"Controlling Language and Diffusion Models by Transporting Activations"

Playback speed

Share post at current time

0:00

Transcript

"Controlling Language and Diffusion Models by Transporting Activations"

The podcast on this paper is generated with Google's Illuminate.

Rohan Paul

Dec 23, 2024

Activation Transport (ACT) steers neural activations using optimal transport theory for precise model behavior control.

Like traffic control for AI thoughts - making models behave exactly how we want

📚 https://arxiv.org/abs/2410.23054

🎯 Original Problem:

Controlling LLMs and diffusion models during inference is challenging and computationally expensive. Current methods like fine-tuning or RLHF require significant resources and can impact model performance on other tasks.

-----

🔧 Solution in this Paper:

→ Introduces Activation Transport (ACT) - a framework that steers model activations using optimal transport theory

→ Uses Linear-ACT: An inference-time intervention that preserves internal activation distributions

→ Implements transport strength parameter λ (0-1) for precise control over intervention level

→ Applies sequential iterative maps to handle causal relationships across layers

→ Uses transport support to prevent out-of-distribution activations

-----

💡 Key Insights:

→ Most existing activation steering methods are special cases of mean transport map

→ Activation distributions in current models show unimodal patterns but different standard deviations

→ Causal estimation achieves more effective conditioning than simultaneous estimation

→ Linear-ACT can be composed with linear layers, resulting in zero computational overhead

-----

📊 Results:

→ Reduces toxicity up to 7.5x in Gemma2-2B and 4.3x in Llama3-8B

→ Achieves fine-grained style control in text-to-image generation

→ Maintains model performance with minimal impact on perplexity and MMLU scores

→ Works effectively across both LLMs and diffusion models - first method to do so

Rohan's Bytes

"Controlling Language and Diffusion Models by Transporting Activations"

Discussion about this video