Breaking down AI's black box: MONET creates specialized experts for every concept
MONET introduces a novel architecture that solves polysemanticity in LLMs by using 262,144 specialized experts per layer, enabling precise knowledge control while maintaining model performance through efficient parameter scaling.
-----
https://arxiv.org/abs/2412.04139
🤔 Original Problem:
→ LLMs suffer from polysemanticity where individual neurons respond to multiple unrelated concepts, making them hard to interpret and control
→ Current solutions like Sparse Autoencoders compromise performance and don't scale well
-----
🔧 Solution in this Paper:
→ MONET introduces expert decomposition that scales to 262,144 experts per layer while keeping parameters proportional to square root of expert count
→ It integrates sparse dictionary learning directly into Mixture-of-Experts pretraining instead of post-hoc reconstruction
→ Uses horizontal and vertical expert decomposition methods to address memory constraints
→ Implements adaptive routing with batch normalization for efficient expert selection
-----
💡 Key Insights:
→ Experts demonstrate monosemantic specialization across different contexts and languages
→ Knowledge can be precisely manipulated by controlling specific experts
→ Parameter growth stays manageable even with large expert counts
-----
📊 Results:
→ Maintains competitive performance with dense LLMs across benchmarks
→ Reduces toxicity by up to 36.2% without degrading general performance
→ Achieves domain-specific knowledge control with minimal impact on unrelated domains
Share this post