Activation Transport (ACT) steers neural activations using optimal transport theory for precise model behavior control.
Like traffic control for AI thoughts - making models behave exactly how we want
📚 https://arxiv.org/abs/2410.23054
🎯 Original Problem:
Controlling LLMs and diffusion models during inference is challenging and computationally expensive. Current methods like fine-tuning or RLHF require significant resources and can impact model performance on other tasks.
-----
🔧 Solution in this Paper:
→ Introduces Activation Transport (ACT) - a framework that steers model activations using optimal transport theory
→ Uses Linear-ACT: An inference-time intervention that preserves internal activation distributions
→ Implements transport strength parameter λ (0-1) for precise control over intervention level
→ Applies sequential iterative maps to handle causal relationships across layers
→ Uses transport support to prevent out-of-distribution activations
-----
💡 Key Insights:
→ Most existing activation steering methods are special cases of mean transport map
→ Activation distributions in current models show unimodal patterns but different standard deviations
→ Causal estimation achieves more effective conditioning than simultaneous estimation
→ Linear-ACT can be composed with linear layers, resulting in zero computational overhead
-----
📊 Results:
→ Reduces toxicity up to 7.5x in Gemma2-2B and 4.3x in Llama3-8B
→ Achieves fine-grained style control in text-to-image generation
→ Maintains model performance with minimal impact on perplexity and MMLU scores
→ Works effectively across both LLMs and diffusion models - first method to do so










