Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

That makes me wonder though what the best loss function was. I assume you used MSE on the logscore. I wonder if a sigmoid on which of two articles has the higher score would yield better results for the downstream RLHF task.


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: