I have explained it many times. [0] Fundamentally, GPT-4, ChatGPT, other LLMs are all in the same family of black-box deep neural networks which after decades since their invention, they still cannot reason or explain their own decisions and outputs and can only spit out what it has been trained on.
Researchers have only trained these LLMs on more data and have even less understanding of what these LLMs do internally since their architectures are with in a massive black-box with unexplainable numbers and operations going on.
That isn't helpful to researchers or even serious professionals in high risk industries. It makes LLMs less trustworthy for them are is incredibly unsuitable for their use-case in general.
This may have been true elsewhere, but I don't think this holds for GPT4.
I suspect that complex intelligence, that cannot be directly attributed to structure of the underlying LLM, has emerged. I am guessing it has to do with the use of language itself and at a sufficient enough size, this property exist in both humans and models.
A lot of the experimentation I've done is too long and complex to fit nicely in an Ask HN post. People have the tendency to move the bar when assigning intelligence to AI. GPT4 is different. Here is a post from earlier today that might be more convincing.
GPT-4 is no different to any old deep neural network and fundamentally, they are black-boxes and have no capability of reasoning. What we are seeing in GPT-4 is regurgitating text it has been trained on.
Not even the researchers who created it can get it to transparently explain it decisions.