"SCHRODINGER’S MEMORY: LARGE LANGUAGE MODELS"

Playback speed

Share post at current time

0:00

Transcript

"SCHRODINGER’S MEMORY: LARGE LANGUAGE MODELS"

Generated this podcast with Google's Illuminate.

Rohan Paul

Jan 02, 2025

LLM memory is like Schrödinger's - only observable when queried.

LLMs exhibit dynamic memory capabilities, similar to human cognitive processes, through dynamic approximation.

📚 https://arxiv.org/pdf/2409.10482

Original Problem 🤔:

Understanding the memory mechanism in LLMs is challenging. Existing research lacks a deep theoretical framework to explain how these models exhibit memory-like behavior, and how it differs from human memory.

-----

Solution in this Paper 🔧:

- Utilizes the Universal Approximation Theorem (UAT) to explain LLM memory.

- Proposes that LLMs' memory operates like "Schrödinger's memory," observable only when queried.

- Compares LLM memory with human memory, suggesting both rely on dynamic fitting of outputs based on inputs.

- Conducts experiments using CN Poems and ENG Poems datasets to assess memory capabilities.

-----

Key Insights from this Paper 💡:

- LLMs can dynamically approximate functions, exhibiting a form of memory.

- Memory in LLMs is not static but inferred from input cues.

- Longer texts challenge LLMs' memory capacity.

- Human and LLM memories share dynamic response mechanisms.

-----

Results 📊:

- Qwen2-1.5B-Instruct achieved 96.9% accuracy on CN Poems.

- Bloom-1b4-zh reached 96.6% accuracy on CN Poems.

- Nearly all models achieved 99.9% accuracy on ENG Poems.

- Memory performance decreases with longer text outputs.

Rohan's Bytes

"SCHRODINGER’S MEMORY: LARGE LANGUAGE MODELS"

Discussion about this video