Well, case in point: If you ask an AI to grade an essay, it will grade the essay...

the_af · 2025-12-15T13:25:22 1765805122

Is this true though? I haven't done the experiment, but I can envision the LLM critiquing its own output (if it was created in a different session) and iteratively correcting it and always finding flaws in it. Are LLMs even primed to say "this is perfect and it needs no further improvements"?

What I have seen is ChatGPT and Claude battling it out, always correcting and finding fault with each other's output (trying to solve the same problem). It's hilarious.

Tepix · 2025-12-16T11:47:53 1765885673

There is a study in German that came to this conclusion, there's an english news article discussing it at https://heise.de/-10222370

KeplerBoy · 2025-12-15T19:52:11 1765828331

Pangram seems to disagree. Not sure how they do it, but their system reliably detected AI in my tests.

https://www.pangram.com/blog/pangram-predicts-21-of-iclr-rev...

noitpmeder · 2025-12-15T13:35:01 1765805701

Citations on this?

Tepix · 2025-12-16T11:45:07 1765885507

https://arxiv.org/abs/2412.06651 (in German, hopefully machine translation works well)

English article:

https://www.heise.de/en/news/38C3-AI-tools-must-be-evaluated...

If you speak German, here is their talk from 38c3: https://media.ccc.de/v/38c3-chatbots-im-schulunterricht