Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
espadrine
on Oct 29, 2024
|
parent
|
context
|
favorite
| on:
Using reinforcement learning and $4.80 of GPU time...
That makes me wonder though what the best loss function was. I assume you used MSE on the logscore. I wonder if a sigmoid on which of two articles has the higher score would yield better results for the downstream RLHF task.
Consider applying for YC's Summer 2026 batch! Applications are open till May 4
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: