"A Multi-Encoder Frozen-Decoder Approach for Fine-Tuning Large Language Models"

Playback speed

Share post at current time

0:00

Transcript

"A Multi-Encoder Frozen-Decoder Approach for Fine-Tuning Large Language Models"

Generated below podcast on this paper with Google's Illuminate.

Rohan Paul

Jan 27, 2025

Fine-tune encoders, freeze decoders: A practical approach to optimize LLM training

A novel approach to freeze LLM decoders while fine-tuning task-specific encoders, improving multilingual performance and reducing deployment overhead.

-----

https://arxiv.org/abs/2501.07818

Original Problem 🔍:

→ Fine-tuning complete LLMs for each task requires massive computational resources and often leads to catastrophic forgetting in multilingual scenarios.

→ Parameter-sharing across tasks offers less control over individual capabilities and needs complete retraining when adding new tasks.

-----

Solution in this Paper 🛠️:

→ The paper introduces a multi-encoder frozen-decoder approach where only task-specific encoders are fine-tuned while keeping the decoder frozen.

→ This setup allows independent training of encoders for different tasks while sharing a common pre-trained decoder.

→ The method is evaluated on AlexaTM model across diverse tasks including summarization, translation, and question answering.

→ For larger models, frozen decoders actually enhance performance in structured tasks.

-----

Key Insights 💡:

→ Freezing decoders significantly reduces deployment overhead and improves cross-lingual transfer

→ Tasks with natural language outputs show minimal performance loss with frozen decoders

→ Larger models with frozen decoders can match or exceed full fine-tuning performance

-----

Results 📊:

→ 8-10% performance improvement in non-English languages using frozen decoders

→ 14% performance gain in structured tasks when using larger frozen decoders

→ 2x increase in training efficiency due to reduced parameter updates

Rohan's Bytes

"A Multi-Encoder Frozen-Decoder Approach for Fine-Tuning Large Language Models"

Discussion about this video