Switch Scaling sparse autoencoders (SAEs) offer compute-efficient scaling for sparse autoencoders in LLM feature extraction.
📚 https://arxiv.org/abs/2410.08201
Original Problem 🔍:
Scaling sparse autoencoders (SAEs) for decomposing neural network activations into interpretable features is computationally expensive, limiting their application to frontier language models.
-----
Solution in this Paper 🛠️:
• Introduces Switch Sparse Autoencoders (Switch SAEs)
• Combines Switch layer with TopK SAE
• Uses multiple expert SAEs and a routing network
• Routes input activations to the most probable expert
• Avoids dense matrix multiplication in encoder
• Trains router and expert SAEs simultaneously
• Balances reconstruction and expert utilization
-----
Key Insights from this Paper 💡:
• Switch SAEs improve FLOPs vs. training loss over existing methods
• Feature duplication across experts reduces SAE capacity
• Encoder features cluster by expert, while decoder features are more diffuse
• Switch SAEs show bias towards duplicate frequent features
-----
Results 📊:
• Switch SAEs achieve better reconstruction than dense SAEs at fixed compute budget
• FLOP-matched Switch SAEs Pareto-dominate TopK, Gated, and ReLU SAEs
• Width-matched Switch SAEs perform slightly worse than TopK SAEs but outperform ReLU SAEs
• Switch SAEs reduce FLOPs per activation by up to 128x while maintaining ReLU SAE performance
• Feature interpretability remains similar to TopK SAEs
Share this post