Sparse autoencoders (SAEs) navigate LLM knowledge conflicts by tweaking neural activations and fix them without retraining.
Steering Knowledge Selection Behaviours in…
Sparse autoencoders (SAEs) navigate LLM knowledge conflicts by tweaking neural activations and fix them without retraining.