Smart network reconstruction makes advanced optimization accessible for deep learning.
Natural Gradient Descent training is now faster and more efficient through a novel network reconstruction approach that breaks down complex calculations into simpler local computations.
https://arxiv.org/abs/2412.07441v1
🤔 Original Problem:
Natural Gradient Descent (NGD) offers superior optimization but calculating inverse Fisher matrices makes it computationally expensive for deep neural networks.
-----
🔧 Solution in this Paper:
→ Introduces Structured Natural Gradient Descent (SNGD) that reconstructs networks with local Fisher layers
→ Decomposes global Fisher matrix calculations into efficient local computations
→ Transforms parameter matrices using G^(-1/2) normalization sub-layers
→ Optimizes new weight parameters using traditional gradient descent
-----
💡 Key Insights:
→ NGD optimization equals fast gradient descent on reconstructed networks
→ Local Fisher layers provide curvature signals and regularization effects
→ Method universally applies across MLP, CNN, LSTM architectures
-----
📊 Results:
→ MNIST: 97.6% test accuracy vs 96.3% for KFAC and 94.8% for SGD
→ CIFAR-10: ResNet-18 achieves 94.44% vs SGD (93.02%) and Adam (92.93%)
→ ImageNet: 73.41% top-1 accuracy, beating SGD (70.23%) and Adam (63.79%)
→ Comparable training time to first-order optimizers despite better performance
Share this post