Discussion about this post

User's avatar
Neural Foundry's avatar

Great roundup on SGTM, which feels like a meaningfully different approach compared to just filtering training data or doing post-hoc unlearning. The 7x resistance to adversarial fine-tuning is pretty noteworthy since it suggests the "forgetting" is more deeply baked into the weight sturcture. What's intresting is whether this could scale to larger models where the compute overhead stays manageable, especially when you're dealing with way more nuanced categories than just "biology knowldege."

Expand full comment

No posts

Ready for more?