Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Well, case in point:

If you ask an AI to grade an essay, it will grade the essay highest that it wrote itself.





Is this true though? I haven't done the experiment, but I can envision the LLM critiquing its own output (if it was created in a different session) and iteratively correcting it and always finding flaws in it. Are LLMs even primed to say "this is perfect and it needs no further improvements"?

What I have seen is ChatGPT and Claude battling it out, always correcting and finding fault with each other's output (trying to solve the same problem). It's hilarious.


There is a study in German that came to this conclusion, there's an english news article discussing it at https://heise.de/-10222370

Pangram seems to disagree. Not sure how they do it, but their system reliably detected AI in my tests.

https://www.pangram.com/blog/pangram-predicts-21-of-iclr-rev...


Citations on this?

https://arxiv.org/abs/2412.06651 (in German, hopefully machine translation works well)

English article:

https://www.heise.de/en/news/38C3-AI-tools-must-be-evaluated...

If you speak German, here is their talk from 38c3: https://media.ccc.de/v/38c3-chatbots-im-schulunterricht




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: