Analysing the Residual Stream of Language Models Under Knowledge Conflicts
This paper gives some idea about peeking inside LLM's brain to catch knowledge conflicts before they cause problems
This paper gives some idea about peeking inside LLM's brain to catch knowledge conflicts before they cause problems
Detect knowledge conflicts by analyzing LLM's residual stream patterns.
Monitor LLM's internal signals to predict which knowledge source it'll use. Track LLM's layer-by-layer behavior to prevent incorrect answers
🤖 Original Problem:
LLMs store factual knowledge in their parameters, but this can conflict with external context information, leading to undesired outputs like using outdated/incorrect data. Current conflict detection methods need extra model interactions and are slow.
🔍 Solution in this Paper:
• Analyzes residual stream (token flow through model layers) to detect knowledge conflicts
• Uses linear probing on model activations at different layers
• Examines hidden states, MLP and Self-Attention activations
• Studies distribution patterns when model uses different knowledge sources
• Focuses on open-domain question-answering tasks with conflicting evidence
💡 Key Insights:
• Knowledge conflict signal emerges around layer 8
• Peaks at layer 14 with 90% detection accuracy
• Decision about which knowledge source to use happens after conflict detection
• Residual stream shows more skewed distribution when using contextual vs parametric knowledge
• Pattern differences emerge in layers 17-30
📊 Results:
• Achieves 90% accuracy in conflict detection at layer 14
• Shows distinct skewness patterns between layers 20-30
• Requires zero modification to model input/parameters
• Introduces negligible computation overhead
💡 The residual stream shows distinct patterns when using different knowledge sources:
More skewed distribution when using contextual knowledge vs parametric knowledge
Pattern differences emerge from layers 17-30
Decision about which knowledge to use happens after conflict detection.
🔬 Practical significance
Enables conflict detection without modifying model input/parameters
Provides insights into how LLMs internally manage knowledge conflicts
Creates foundation for controlling knowledge selection process
Helps prevent unexpected answers before they're generated.