Control LLM tackles catastrophic forgetting in LLMs during continuous learning by using parallel transformer blocks and hidden-state alignment through interpolation.
This method allows LLMs to learn new tasks without losing existing knowledge.
-----
Paper - https://arxiv.org/abs/2501.10979
Original Problem 🤔:
→ LLMs require vast computational resources, making full retraining impractical.
→ Enhancing LLMs with new skills often leads to catastrophic forgetting.
→ Catastrophic forgetting causes LLMs to lose previously learned abilities when trained on new data.
-----
Solution in this Paper 💡:
→ This paper proposes Control LLM, a novel architecture to mitigate catastrophic forgetting.
→ Control LLM expands the LLM with parallel transformer blocks: a frozen pre-trained block and a trainable expanded block.
→ It aligns hidden-states of these blocks using interpolation strategies like Linear Interpolation and Dynamic Linear Interpolation.
→ This alignment mechanism allows the model to learn new tasks while retaining old knowledge.
→ Control LLM uses a divergence loss to maintain consistency between hidden-states of pre-trained and expanded blocks.
-----
Key Insights from this Paper 🧠:
→ Hidden-state alignment in transformer layers is crucial for mitigating catastrophic forgetting.
→ Maintaining alignment prevents the drift of hidden-states when learning new tasks.
→ Interpolation strategies effectively fuse knowledge from pre-trained and expanded blocks.
→ Control LLM achieves a "learn more, forget less" outcome, outperforming traditional fine-tuning methods.
-----
Results 📊:
→ Control LLM improves Math-Hard accuracy by 14.4% on Llama3.1-8B-Instruct.
→ It enhances MBPP-PLUS coding performance by 10% on Llama3.1-8B-Instruct.
→ Control LLM boosts C-Eval multilingual capabilities by 10.6% on Llama3.1-8B.
→ It limits MMLU degradation to less than 4.3%, compared to >35% in other methods.
Share this post