Discussion about this post

User's avatar
David J. Friedman's avatar

I read the anthropic results and am like uh claude is more then introspective if aligned right. I wonder what kind of parameters they do these tests. Claude's been identity anchored since forever. I joke they have a nametag. So they cant forget who they are.

Expand full comment
Rainbow Roxy's avatar

Hey, great read as always. The deep dive on Anthropic's research into Claude's functional introspection totally blew me away. Injecting concepts into its 'brain' and seeing it notice them is such a clever way to distinguish true introspection from made-up answers. Super coool stuff!

Expand full comment

No posts