Why would anyone trust the output of an LLM, if it is barely better than guessing and much much worse than humans?
GPT-5 shows more impressive numbers, but for that particular task, the precision should be 100% - always. No matter how large the data set is or in which format.
Why are we doing this?
Why would anyone trust the output of an LLM, if it is barely better than guessing and much much worse than humans?
GPT-5 shows more impressive numbers, but for that particular task, the precision should be 100% - always. No matter how large the data set is or in which format. Why are we doing this?