Rohan's Bytes
Subscribe
Sign in
Share this post
Rohan's Bytes
RL, BUT DON’T DO ANYTHING I WOULDN’T DO
Copy link
Facebook
Email
Notes
More
RL, BUT DON’T DO ANYTHING I WOULDN’T DO
Rohan Paul
Nov 4, 2024
Share this post
Rohan's Bytes
RL, BUT DON’T DO ANYTHING I WOULDN’T DO
Copy link
Facebook
Email
Notes
More
AI models can exploit uncertainty gaps in their training constraints to learn unwanted behaviors.
Read →
Comments
Share
Copy link
Facebook
Email
Notes
More
This site requires JavaScript to run correctly. Please
turn on JavaScript
or unblock scripts
Share this post
RL, BUT DON’T DO ANYTHING I WOULDN’T DO
Share this post
AI models can exploit uncertainty gaps in their training constraints to learn unwanted behaviors.