Cross-Entropy Is All You Need To Invert the Data Generating Process
Cross-entropy: The hidden key to understanding how deep learning works
Cross-entropy: The hidden key to understanding how deep learning works
Cross-entropy loss alone can recover the true underlying structure of data
๐ค Original Problem:
Despite supervised learning's success in deep learning, we lack a comprehensive theory explaining why it effectively learns interpretable and transferable representations. Current models show intriguing phenomena like neural analogy-making and linear representations, but we don't understand why these properties emerge.
๐ง Solution in this Paper:
โ Extends Independent Component Analysis (ICA) theory to parametric instance discrimination through DIET method
โ Models data generation in cluster-centric way, showing learned representations linearly relate to ground-truth ones
โ Proves cross-entropy based supervised learning can recover ground-truth latent variables up to linear transformations
โ Provides theoretical framework connecting instance discrimination to supervised classification
๐ก Key Insights:
โ Cross-entropy loss alone can lead to linear identifiability of features, regardless of label meaningfulness
โ Models can recover ground-truth latent variables even in standard classification tasks
โ Supervised learning performs non-linear ICA
โ The success of transfer learning and neural analogy-making can be explained through this framework
๐ Results:
โ Successfully demonstrated disentanglement of latent factors on simulated data matching theoretical assumptions
โ Validated on DisLib disentanglement benchmark showing classification tasks recover latent structures
โ Proved models trained on ImageNet encode representations permitting linear decoding of proxy factors of variation
๐จโ๐ง The core purpose of the paper
The paper proves that cross-entropy based supervised learning can recover ground-truth latent variables up to linear transformations.
It shows how standard classification tasks using cross-entropy loss can learn interpretable and transferable representations, explaining phenomena like neural analogy-making and linear representations in deep learning.


