Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Fascinating. Now, I want to try it before the humans put a stop to it :)


I failed to replicate the attack later in the evening in a "new" conversation. It does appear to me the model is learning between conversations, even without human input or RLHF.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: