Fine-tune encoders, freeze decoders: A practical approach to optimize LLM training
A novel approach to freeze LLM decoders while fine-tuning task-specific encoders, improving multilingual performance and reducing deployment overhead.
-----
https://arxiv.org/abs/2501.07818
Original Problem 🔍:
→ Fine-tuning complete LLMs for each task requires massive computational resources and often leads to catastrophic forgetting in multilingual scenarios.
→ Parameter-sharing across tasks offers less control over individual capabilities and needs complete retraining when adding new tasks.
-----
Solution in this Paper 🛠️:
→ The paper introduces a multi-encoder frozen-decoder approach where only task-specific encoders are fine-tuned while keeping the decoder frozen.
→ This setup allows independent training of encoders for different tasks while sharing a common pre-trained decoder.
→ The method is evaluated on AlexaTM model across diverse tasks including summarization, translation, and question answering.
→ For larger models, frozen decoders actually enhance performance in structured tasks.
-----
Key Insights 💡:
→ Freezing decoders significantly reduces deployment overhead and improves cross-lingual transfer
→ Tasks with natural language outputs show minimal performance loss with frozen decoders
→ Larger models with frozen decoders can match or exceed full fine-tuning performance
-----
Results 📊:
→ 8-10% performance improvement in non-English languages using frozen decoders
→ 14% performance gain in structured tasks when using larger frozen decoders
→ 2x increase in training efficiency due to reduced parameter updates
Share this post