"Interweaving Memories of a Siamese Large Language Model"

Playback speed

Share post at current time

0:00

Transcript

"Interweaving Memories of a Siamese Large Language Model"

Generated below podcast on this paper with Google's Illuminate.

Rohan Paul

Jan 23, 2025

Giving LLMs the ability to consult both their old and new experiences.

IMSM introduces a siamese LLM architecture that prevents catastrophic forgetting by interweaving memories from original and fine-tuned parameters during inference.

https://arxiv.org/abs/2412.17383v1

🤔 Original Problem:

→ Parameter-efficient fine-tuning (PEFT) methods often lead to catastrophic forgetting in LLMs, where models lose their original world knowledge while learning new task-specific information

→ Existing solutions like data replay are expensive and complex to implement

-----

🔧 Solution in this Paper:

→ IMSM uses a siamese LLM architecture where one copy keeps original parameters frozen while another gets fine-tuned

→ For each input query, it generates two distinct memories (hidden states) - one from original parameters and one from fine-tuned parameters

→ A query-aware gate mechanism dynamically weights and combines these memories when generating outputs

→ The gate uses low-rank matrices to keep parameter count minimal while enabling flexible memory fusion

-----

💡 Key Insights:

→ Final hidden states can serve as "memories" of an LLM's understanding

→ Combining memories from both original and fine-tuned parameters helps balance old and new knowledge

→ Query-aware gating allows dynamic control over which knowledge source to prioritize

-----

📊 Results:

→ Outperforms baseline PEFT methods while adding only 0.164M parameters

→ Improves average performance by 2.19% compared to LoRA across tasks

→ Reduces catastrophic forgetting by maintaining 95% of original performance on non-target tasks

Rohan's Bytes

"Interweaving Memories of a Siamese Large Language Model"

Discussion about this video