0:00
/
0:00
Transcript

"Interweaving Memories of a Siamese Large Language Model"

Generated below podcast on this paper with Google's Illuminate.

Giving LLMs the ability to consult both their old and new experiences.

IMSM introduces a siamese LLM architecture that prevents catastrophic forgetting by interweaving memories from original and fine-tuned parameters during inference.

https://arxiv.org/abs/2412.17383v1

🤔 Original Problem:

→ Parameter-efficient fine-tuning (PEFT) methods often lead to catastrophic forgetting in LLMs, where models lose their original world knowledge while learning new task-specific information

→ Existing solutions like data replay are expensive and complex to implement

-----

🔧 Solution in this Paper:

→ IMSM uses a siamese LLM architecture where one copy keeps original parameters frozen while another gets fine-tuned

→ For each input query, it generates two distinct memories (hidden states) - one from original parameters and one from fine-tuned parameters

→ A query-aware gate mechanism dynamically weights and combines these memories when generating outputs

→ The gate uses low-rank matrices to keep parameter count minimal while enabling flexible memory fusion

-----

💡 Key Insights:

→ Final hidden states can serve as "memories" of an LLM's understanding

→ Combining memories from both original and fine-tuned parameters helps balance old and new knowledge

→ Query-aware gating allows dynamic control over which knowledge source to prioritize

-----

📊 Results:

→ Outperforms baseline PEFT methods while adding only 0.164M parameters

→ Improves average performance by 2.19% compared to LoRA across tasks

→ Reduces catastrophic forgetting by maintaining 95% of original performance on non-target tasks

Discussion about this video