0:00
/
0:00
Transcript

"LAuReL: Learned Augmented Residual Layer"

The podcast on this paper is generated with Google's Illuminate.

LAUREL, proposed in this paper, makes residual connections smarter by learning how to combine layer outputs optimally

https://arxiv.org/abs/2411.07501

🎯 Original Problem:

→ Traditional residual connections in deep neural networks are static and don't adapt to different layers' needs, limiting model efficiency.

-----

🔧 Solution in this Paper:

→ LAUREL (Learned Augmented Residual Layer) introduces learnable parameters to the residual connection, making it dynamic and context-aware.

→ It comes in three versions: LAUREL-RW adds learnable weights to scale both function output and residual connection, LAUREL-LR uses low-rank matrices for richer transformations, and LAUREL-PA incorporates information from previous layer activations.

→ The framework allows mixing these versions together while keeping parameter overhead minimal (0.003% to 0.012%).

-----

💡 Key Insights:

→ Residual connections can be made more expressive without significant parameter increase

→ Low-rank approximations effectively balance expressiveness and efficiency

→ Previous layer information helps in better feature representation

→ The framework remains stable despite fundamental changes to residual connections

-----

📊 Results:

→ On ResNet-50/ImageNet-1K: Achieved 60% gains of adding extra layer with only 0.003% more parameters

→ LAUREL-RW+LR matched performance of extra layer using 2.6x fewer parameters

→ Improved performance across Q&A, NLU, Math, Code tasks while adding only 0.012% parameters

Discussion about this video

User's avatar