Training a neural-network actually modifies only a small portion of neural network parameters, leaving most unchanged.
The paper introduces a novel way to understand neural network training by analyzing how final parameters relate to their initial values through the training Jacobian matrix.
-----
https://arxiv.org/abs/2412.07003
🤔 Original Problem:
Understanding how neural networks learn during training remains a black box, particularly in determining which parameter changes matter most and how the training process operates in high-dimensional space.
-----
🔬 Solution in this Paper:
→ The researchers examine the Jacobian matrix of trained network parameters with respect to their initial values.
→ They discovered the singular value spectrum of this Jacobian has three distinct regions: chaotic (values > 1), bulk (values ≈ 1), and stable (values < 1).
→ The bulk region, spanning about two-thirds of parameter space, remains virtually unchanged during training.
→ These bulk directions don't affect in-distribution predictions but significantly impact out-of-distribution behavior.
-----
🎯 Key Insights:
→ Training is intrinsically low-dimensional, with most parameter changes happening in a small subspace
→ The bulk subspace is independent of initialization and labels but depends strongly on input data
→ Training linearization remains valid much longer along bulk directions than chaotic ones
→ The bulk overlaps significantly with the nullspace of parameter-function Jacobian on test data
-----
📊 Results:
→ ~3000 out of 4810 singular values are extremely close to one
→ Bulk directions show near-perfect linear behavior across 7 orders of magnitude
→ Training restricted to bulk complement performs similar to unconstrained training
→ Bulk subspaces from different random seeds show high similarity (much higher than random chance)