Multiple knowledge edits in LLMs cause severe performance drops, similar to memory overload in humans
LLMs can only handle about 20 knowledge updates.
📚 https://arxiv.org/abs/2410.18785
Original Problem 🎯:
Current model editing methods for LLMs focus on reliability, generalization, and locality for small-scale knowledge updates. However, the impact on general model capabilities after multiple edits remains unexplored.
-----
Solution in this Paper 🛠️:
• Comprehensive evaluation framework testing edited models across multiple benchmarks (MMLU, BBH, GSM8K, CSQA)
• Analysis of various editing methods (ROME, MEMIT, PMET, MEND, KN) on different model scales
• Testing both base and instruction-tuned models
• Evaluation of safety metrics post-editing
• Sequential editing assessment from dozens to thousands of edits
-----
Key Insights from this Paper 💡:
• Performance deteriorates significantly after dozens of edits
• Instruction-tuned models show better resilience to editing
• Larger models demonstrate stronger resistance to editing effects
• Safety capabilities weaken even in safety-aligned models
• Complete knowledge disruption occurs at scale (10k edits)
-----
Results 📊:
• Model performance maintains stability only up to ~20 edits
• After 50 edits: ROME shows 30% performance drop, MEMIT 15%
• PMET and MEND maintain performance up to several hundred edits
• Safety metrics decline: TruthfulQA scores drop by 20% after 100 edits
• Instruction-tuned models show 15% better retention of capabilities
Share this post