Efficient Dictionary Learning with Switch Sparse Autoencoders

Playback speed

Share post at current time

0:00

Transcript

Efficient Dictionary Learning with Switch Sparse Autoencoders

Generated this podcast with Google's Illuminate.

Rohan Paul

Dec 26, 2024

Switch Scaling sparse autoencoders (SAEs) offer compute-efficient scaling for sparse autoencoders in LLM feature extraction.

📚 https://arxiv.org/abs/2410.08201

Original Problem 🔍:

Scaling sparse autoencoders (SAEs) for decomposing neural network activations into interpretable features is computationally expensive, limiting their application to frontier language models.

-----

Solution in this Paper 🛠️:

• Introduces Switch Sparse Autoencoders (Switch SAEs)

• Combines Switch layer with TopK SAE

• Uses multiple expert SAEs and a routing network

• Routes input activations to the most probable expert

• Avoids dense matrix multiplication in encoder

• Trains router and expert SAEs simultaneously

• Balances reconstruction and expert utilization

-----

Key Insights from this Paper 💡:

• Switch SAEs improve FLOPs vs. training loss over existing methods

• Feature duplication across experts reduces SAE capacity

• Encoder features cluster by expert, while decoder features are more diffuse

• Switch SAEs show bias towards duplicate frequent features

-----

Results 📊:

• Switch SAEs achieve better reconstruction than dense SAEs at fixed compute budget

• FLOP-matched Switch SAEs Pareto-dominate TopK, Gated, and ReLU SAEs

• Width-matched Switch SAEs perform slightly worse than TopK SAEs but outperform ReLU SAEs

• Switch SAEs reduce FLOPs per activation by up to 128x while maintaining ReLU SAE performance

• Feature interpretability remains similar to TopK SAEs

Rohan's Bytes

Efficient Dictionary Learning with Switch Sparse Autoencoders

Discussion about this video