Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

LLMs already have problems with fact vs fiction. I don't see how Reddit of all places has "valuable data" in that regard.


I think the value is in the examples it provides of language.


Top upvoted comments can filter out the useless information and then it can be trained on actual data and refined.


Except when top voted comments are hivemind approved 'funny' quips/responses, or in reply to exercises in creative writing like half the posts in relationshipadvice, iwantthemanager, nuclear/pettyrevenge, etc


Is this a joke that I'm missing? Top reddit posts are frequently trash filled with misinformation.


Many popular LLMs already include large amount of Reddit comment data which is (usually) cited in their respective papers.


Reddit also has a problem with fact vs fiction.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: