Paper - https://arxiv.org/abs/2411.17299
Train once, embed anywhere: 2D Matryoshka lets models generate flexible-sized embeddings from any layer.
2D Matryoshka Training enables models to generate embeddings of varying dimensions from different layers simultaneously, reducing computational costs while maintaining effectiveness.
-----
🤔 Original Problem:
→ Modern transformer models require significant computational resources for text encoding, especially during query processing in search scenarios.
→ Current approaches either sacrifice model performance for speed or maintain high computational costs.
-----
🔧 Solution in this Paper:
→ 2D Matryoshka Training introduces two versions for simultaneous training across layers and dimensions.
→ Version 1 combines last layer embeddings with random sub-layer embeddings, using KL divergence for distribution alignment.
→ Version 2 employs logarithmic weighting across all layers with PCA alignment and dimension-specific losses.
→ The approach enables flexible embedding generation from any layer and dimension configuration.
-----
💡 Key Insights:
→ Smaller dimension embeddings (below 128) show significant performance drop in retrieval tasks
→ Fixed document encoders don't consistently improve performance across all setups
→ Training on multiple target dimensions smooths effectiveness curve but trades off performance at higher dimensions
-----
📊 Results:
→ Outperforms traditional Matryoshka training on sub-dimensions
→ Version 2 achieves higher effectiveness than Version 1 across most configurations
→ Training requires 600+ GPU hours on H100
Share this post