Discussion about this post

User's avatar
Dan McRae's avatar

great article. appreciate the cites to alphaXiv. Interesting that smaller (like 7B) models can self-improve. Now this- “Fine-tuning, even on benign data, unpredictably degrades refusal behaviour, making each self-improvement round require fresh safety audits”- is fodder for the AI in my novel. Also, something to take note of in the real world.

Expand full comment
2 more comments...

No posts