Giving LLMs the ability to consult both their old and new experiences.
IMSM introduces a siamese LLM architecture that prevents catastrophic forgetting by interweaving memories from original and fine-tuned parameters during inference.
https://arxiv.org/abs/2412.17383v1
🤔 Original Problem:
→ Parameter-efficient fine-tuning (PEFT) methods often lead to catastrophic forgetting in LLMs, where models lose their original world knowledge while learning new task-specific information
→ Existing solutions like data replay are expensive and complex to implement
-----
🔧 Solution in this Paper:
→ IMSM uses a siamese LLM architecture where one copy keeps original parameters frozen while another gets fine-tuned
→ For each input query, it generates two distinct memories (hidden states) - one from original parameters and one from fine-tuned parameters
→ A query-aware gate mechanism dynamically weights and combines these memories when generating outputs
→ The gate uses low-rank matrices to keep parameter count minimal while enabling flexible memory fusion
-----
💡 Key Insights:
→ Final hidden states can serve as "memories" of an LLM's understanding
→ Combining memories from both original and fine-tuned parameters helps balance old and new knowledge
→ Query-aware gating allows dynamic control over which knowledge source to prioritize
-----
📊 Results:
→ Outperforms baseline PEFT methods while adding only 0.164M parameters
→ Improves average performance by 2.19% compared to LoRA across tasks
→ Reduces catastrophic forgetting by maintaining 95% of original performance on non-target tasks
Share this post