Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Wait, why wouldn’t RLHF influence word choices?


I didn't say it wouldn't (or rather couldn't), I said it was unlikely for the selected hypothesis given standard training data vs RLHF iterations.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: