Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Yes it can! That's the whole point of RL! it generates slightly out of distribution rollouts, and rewards good rollouts to change the distribution of the output


That's not out of distributíon, that's inside the distribution of the rollout. If you don't create rollouts for the game of Chess then it doesn't know how to play Chess no matter how smart it is at tasks you've created rollouts for. It's structurally stuck in its distribution.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: