Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> This will certainly require a lot of reworking of these traditional curriculums that consist heavily of asking the student to generate bullshit, but maybe that wasn't the best way to educate students this whole time.

Just curious which discipline you have a grudge against here. Because presumably disciplines are actually disciplines where someone working in the field for their entire career can spot BS.



GP did not mention disciplines, and I don't think individual disciplines are to blame.

The approach of evaluating students based on "text generation" is very boring to study for, easy to fool (a parent/guardian can do it, last year's students can pass you an A grade answer, ChatGPT can generate it) and doesn't prepare students for reality (making new things, solving new problems).


How exactly do you teach "Making new things, solving new problems."?

How can you solve a math problem if you can't do basic algebra, even if you can run an algebraic statement through wolfram and get a result?

You learn problem solving largely through solving already solved problems in life. They're new problems to the student, not the teacher.


What if the thing you are supposed to be making is written communication in the form of a document, or a book, or a paragraph explaining why production was down? What if the goal of these things was to make a student literate?


Not all of us can be making new things and solving new problems, however..


Not the parent poster, but I suspect this may be pointing towards humanities-based subjects, where I've seen regular assignments in the form of essays on fairly subjective topics.

I do think there will be a challenge in handling use of GPT in some kinds of essay questions, even in objective and "right/wrong" type disciplines - I asked a friend to grade some answers to an essay style exam question, and they felt the GPT output was considerably better than many students in the class - ChatGPT was producing markedly better output than weak students. This wasn't a properly blinded or double-blinded test, but the output from GPT was so much better than weaker students that there was little need. You could tell from glancing at the text that it was better.

ChatGPT had far better grasp of the English language, and the language model meant what it wrote was more coherent, structured, and better flowing than a weaker student would write. It had a significantly better vocabulary than many students, and the GPT output appeared to always use words correctly. Sentences were fully-formed and coherent. They had proper grammatical structure, made proper use of punctuation, and advanced a cohesive idea (as the essay asked for). This wasn't true of weaker (human) students' attempts.

The content that ChatGPT produced was not always factually perfect, but when compared with anyone who wasn't "top of the class", there were fewer factual errors in the Chat GPT output, since it was mostly regurgitating fairly simple text. A paraphrase of part of the Wikipedia article would probably have beaten many students though.

Where GPT output falls apart is references - even when asked to produce references, they are generally non-existent (plausible paper name, plausible author, plausible journal name, just that paper doesn't exist and that author didn't publish anything in that journal, etc.), and do not actually support the statements which cite them - this still seems to be a fairly robust way of detecting non-human output at present.

It would be interesting to see this kind of experiment carried out at scale in a controlled environment, where the examiners are unaware that GPT outputs are interspersed with real students' work, in order to make it a fair assessment.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: