I thought about it a while back. My concept was using RLHF to train a LLM to extract key points, their premises, and generate counter questions. A human could filter the questions. That feedback becomes training material.
Once better with numbers, maybe have one spot statistical errors. I think a constantly-updated, field-specific checklist for human reviewers made more sense on that, though.
For a data source, I thought OpenReview.net would be a nice start.
Once better with numbers, maybe have one spot statistical errors. I think a constantly-updated, field-specific checklist for human reviewers made more sense on that, though.
For a data source, I thought OpenReview.net would be a nice start.