Sparse autoencoders (SAEs) navigate LLM knowledge conflicts by tweaking neural activations and fix them without retraining.
Share this post
Steering Knowledge Selection Behaviours in…
Share this post
Sparse autoencoders (SAEs) navigate LLM knowledge conflicts by tweaking neural activations and fix them without retraining.