Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I think it’s perfectly clear: the model must know it’s been tampered with because it reports tampering before it reports which concept has been injected into its internal state. It can only do this if it has introspection capabilities.


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: