Information-based clustering helps models handle unexpected data variations effectively.
This paper introduces ClusT3, a novel Test-Time Training approach that maximizes mutual information between feature maps and discrete representations to improve model adaptation to domain shifts. The method outperforms existing approaches while being computationally efficient and problem-agnostic.
-----
https://arxiv.org/abs/2412.03933
🤔 Original Problem:
Deep Learning models often struggle with domain shifts at test time, where test data differs significantly from training data. Current Test-Time Training methods rely on complex self-supervised tasks that are computationally expensive.
-----
🔧 Solution in this Paper:
→ ClusT3 uses a clustering strategy based on Mutual Information maximization between multi-scale feature maps and discrete latent representations.
→ The method employs a shallow projector to map features into cluster probability distributions, maximizing mutual information between features and their discrete encoding.
→ Multiple projectors are placed on different CNN layers to capture multi-scale information, with the first two layers providing the most effective results.
→ The approach requires minimal architectural changes and is more efficient than existing methods that need multiple complex steps.
-----
💡 Key Insights:
→ Information between features and their discrete representation should remain constant across domains
→ Early layers contain most domain-related information
→ Using multiple projectors per layer improves performance
→ The method is self-sufficient and requires less computational overhead
-----
📊 Results:
→ Outperformed ResNet50 baseline by 28.26% on CIFAR-10-C dataset
→ Achieved 82.08% accuracy on CIFAR-10-C Level 5 corruptions
→ Improved performance on CIFAR-100-C with 56.70% accuracy
→ Gained 15.6% improvement on sim-to-real domain shift tasks
Share this post