SMoSE splits complex control tasks into simple, interpretable experts that work together through smart routing.
SMoSE introduces a sparse mixture-of-experts architecture that combines simple interpretable decision-makers with a router for transparent yet high-performing continuous control tasks.
"Sparse Mixture of Shallow Experts for Interpretable Reinforcement Learning in Continuous Control Tasks"
https://arxiv.org/abs/2412.13053
🤖 Original Problem:
→ Current state-of-the-art continuous control systems use complex black-box policies that are effective but lack transparency
→ Existing interpretable policies underperform compared to black-box models, creating a gap between performance and interpretability
-----
🔧 Solution in this Paper:
→ SMoSE uses a top-1 Mixture-of-Experts architecture with M interpretable shallow experts trained for different basic skills
→ Each expert is a linear policy specialized in a specific control task
→ An interpretable router assigns tasks among experts based on current state
→ Only one expert is active per decision for maximum interpretability
→ Training uses Soft Actor-Critic with load-balancing to ensure fair expert usage
→ Decision trees are distilled from router weights to improve interpretability
-----
🎯 Key Insights:
→ Sparse activation with single expert selection provides clear decision paths
→ Linear policies for both experts and router maintain full interpretability
→ Load balancing prevents expert collapse and ensures balanced skill distribution
→ Decision tree distillation creates human-readable representation of routing logic
-----
📊 Results:
→ Tested on 6 MuJoCo continuous control benchmarks
→ Outperforms existing interpretable baselines on 5 out of 6 environments
→ Narrows performance gap with non-interpretable state-of-the-art methods
→ Maintains full interpretability while achieving competitive results
Share this post