I thought about it a while back. My concept was using RLHF to train a LLM to ext...

I thought about it a while back. My concept was using RLHF to train a LLM to extract key points, their premises, and generate counter questions. A human could filter the questions. That feedback becomes training material.

Once better with numbers, maybe have one spot statistical errors. I think a constantly-updated, field-specific checklist for human reviewers made more sense on that, though.

For a data source, I thought OpenReview.net would be a nice start.