Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I thought about it a while back. My concept was using RLHF to train a LLM to extract key points, their premises, and generate counter questions. A human could filter the questions. That feedback becomes training material.

Once better with numbers, maybe have one spot statistical errors. I think a constantly-updated, field-specific checklist for human reviewers made more sense on that, though.

For a data source, I thought OpenReview.net would be a nice start.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: