LAUREL, proposed in this paper, makes residual connections smarter by learning how to combine layer outputs optimally
https://arxiv.org/abs/2411.07501
🎯 Original Problem:
→ Traditional residual connections in deep neural networks are static and don't adapt to different layers' needs, limiting model efficiency.
-----
🔧 Solution in this Paper:
→ LAUREL (Learned Augmented Residual Layer) introduces learnable parameters to the residual connection, making it dynamic and context-aware.
→ It comes in three versions: LAUREL-RW adds learnable weights to scale both function output and residual connection, LAUREL-LR uses low-rank matrices for richer transformations, and LAUREL-PA incorporates information from previous layer activations.
→ The framework allows mixing these versions together while keeping parameter overhead minimal (0.003% to 0.012%).
-----
💡 Key Insights:
→ Residual connections can be made more expressive without significant parameter increase
→ Low-rank approximations effectively balance expressiveness and efficiency
→ Previous layer information helps in better feature representation
→ The framework remains stable despite fundamental changes to residual connections
-----
📊 Results:
→ On ResNet-50/ImageNet-1K: Achieved 60% gains of adding extra layer with only 0.003% more parameters
→ LAUREL-RW+LR matched performance of extra layer using 2.6x fewer parameters
→ Improved performance across Q&A, NLU, Math, Code tasks while adding only 0.012% parameters
Share this post