Can Knowledge Editing Really Correct Hallucinations?

Popular LLM editing methods can't actually fix wrong answers. This paper tries to solve it.

Nov 03, 2024

paper - https://arxiv.org/abs/2410.16251

Popular LLM editing methods can't actually fix wrong answers. This paper tries to solve it.

New dataset with 6000+ verified hallucinations reveals huge gaps in LLM knowledge correction and that LLM editing methods fail across multiple dimensions.

Original Problem 🔍:

Knowledge editing methods for LLMs lack proper evaluation on real hallucinations. Current datasets don't verify if models actually generate incorrect answers before editing, making it difficult to assess effectiveness in fixing real-world hallucinations.

i.e. current tests try to "fix" LLM answers without checking if they were wrong first

Solution in this Paper 🛠️:

• Created HalluEditBench - a benchmark with 9 domains, 26 topics, 6,000+ verified hallucinations

• Tests 7 editing methods (FT-L, FT-M, MEMIT, ROME, LoRA, ICE, GRACE) on 3 LLMs

• Evaluates across 5 dimensions:

Efficacy: Tests correction accuracy
Generalization: Checks edit persistence across question types
Portability: Measures downstream reasoning effects
Locality: Assesses impact on unrelated knowledge
Robustness: Tests resistance to manipulations

Key Insights from this Paper 💡:

• Current assessment methods are unreliable - high performance on existing datasets doesn't reflect real hallucination correction

• No single editing method excels across all dimensions

• Performance heavily depends on domain and specific LLM

• Parameter-preserving methods (ICE, GRACE) show better efficacy but poor robustness

Results 📊:

• ICE and GRACE outperform others in Efficacy

• Only ICE improves Generalization performance

• Most methods underperform on Portability compared to pre-edit state

• FT-M and ICE lead in Locality (80% score on Mistral-v0.3-7B)

• ICE shows poor Robustness against manipulations

Knowledge editing techniques

No single editing method excels across all dimensions:

ICE and GRACE perform best on Efficacy
Only ICE improves Generalization
Most methods underperform on Portability
FT-M and ICE lead on Locality
ICE shows poor Robustness

Discussion about this post

No posts

#nojs-banner { position: fixed; bottom: 0; left: 0; padding: 16px 16px 16px 32px; width: 100%; box-sizing: border-box; background: red; color: white; font-family: -apple-system, "Segoe UI", Roboto, Helvetica, Arial, sans-serif, "Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Symbol"; font-size: 13px; line-height: 13px; } #nojs-banner a { color: inherit; text-decoration: underline; } This site requires JavaScript to run correctly. Please turn on JavaScript or unblock scripts