Neural nets just got a personal tailor - STAR crafts custom architectures that fit your exact needs.
STAR introduces automated architecture synthesis using linear input-varying systems theory and evolutionary optimization to create efficient, high-performing models that outperform standard Transformers.
-----
https://arxiv.org/abs/2411.17800
🤔 Original Problem:
Optimizing model architectures remains challenging and expensive. Current automated and manual approaches are limited by simplistic search spaces and basic design patterns.
-----
🔧 Solution in this Paper:
→ STAR introduces a hierarchical search space based on linear input-varying systems theory.
→ It creates a numerical genome encoding that represents model architectures at three levels: featurization, operator structure, and backbone composition.
→ The system uses evolutionary algorithms to automatically refine and recombine architectures, optimizing for multiple metrics like quality and efficiency.
→ STAR enables sharing of components between layers through featurizer and feature group sharing mechanisms.
-----
💡 Key Insights:
→ Hierarchical search spaces are more effective than flat ones for architecture optimization
→ Linear input-varying systems can generalize most computational units in deep learning
→ Evolutionary optimization with well-designed constraints leads to stable training
→ Component sharing between layers improves efficiency
-----
📊 Results:
→ 7/8 STAR architectures improved over Transformers with 13% fewer parameters
→ Achieved 37% smaller cache sizes than hybrid models while maintaining quality
→ Successfully scaled from 125M to 1B parameters while preserving advantages
→ All evaluated architectures outperformed standard hybrids on downstream tasks
Share this post