0:00
/
0:00
Transcript

"Should We Really Edit Language Models? On the Evaluation of Edited Language Models"

Generated this podcast on this Paper with Google's Illuminate, a specialized tool to create podcast from arXiv papers only

Multiple knowledge edits in LLMs cause severe performance drops, similar to memory overload in humans

LLMs can only handle about 20 knowledge updates.

📚 https://arxiv.org/abs/2410.18785

Original Problem 🎯:

Current model editing methods for LLMs focus on reliability, generalization, and locality for small-scale knowledge updates. However, the impact on general model capabilities after multiple edits remains unexplored.

-----

Solution in this Paper 🛠️:

• Comprehensive evaluation framework testing edited models across multiple benchmarks (MMLU, BBH, GSM8K, CSQA)

• Analysis of various editing methods (ROME, MEMIT, PMET, MEND, KN) on different model scales

• Testing both base and instruction-tuned models

• Evaluation of safety metrics post-editing

• Sequential editing assessment from dozens to thousands of edits

-----

Key Insights from this Paper 💡:

• Performance deteriorates significantly after dozens of edits

• Instruction-tuned models show better resilience to editing

• Larger models demonstrate stronger resistance to editing effects

• Safety capabilities weaken even in safety-aligned models

• Complete knowledge disruption occurs at scale (10k edits)

-----

Results 📊:

• Model performance maintains stability only up to ~20 edits

• After 50 edits: ROME shows 30% performance drop, MEMIT 15%

• PMET and MEND maintain performance up to several hundred edits

• Safety metrics decline: TruthfulQA scores drop by 20% after 100 edits

• Instruction-tuned models show 15% better retention of capabilities

Discussion about this video