Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Literally everything is trivial to jailbreak.

The core concept is to pass information into the model using a cipher. One that is not too hard that it can't figure it out, but not too easy as to be detected.

And yes, o1 was jailbroken shortly after release: https://x.com/elder_plinius/status/1834381507978280989



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: