Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The fact that instruction tuning works at all is a small miracle, getting a rigorous idea of trusted vs untrusted input is not at all an easy task.


It should work like normal instruction tuning, except the SFT examples contain additional instructions in <|quote|> tokens which are ignored in the sample response. So more complex than ordinary SFT but not that much more.


There are LLM finetunes which do this, it is very far from watertight.


Example?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: