It did, actually. The model was trained with multiple rounds of reinforcement learning where human judges provided the feedback: first with full answers, and then with ranking of answers as most relevant.
So the model in production is probably frozen, but before that it went through multiple rounds of interaction with the world.
The reinforcement learning was on giving the right answer, not on interacting with the world. But there is movement in the right direction with https://ai.googleblog.com/2022/12/rt-1-robotics-transformer-...
and other RL stuff. (RT-1 isn't RL but there is other related stuff that is)
Oh, you meant interaction as a joint training with images, actions, feedback etc. That would be the next generation I guess.
I am simply thinking of interaction here as similar to learning a language in a classroom. First the teacher provides sample questions/answers, then the teacher asks the students to come up with answers themselves, and tell them which one is better. The end result here is I think ChatGPT is quite good at answering questions and can pass as a human, especially if it's augmented with a fact database, so obviously wrong answers can be pruned.
So the model in production is probably frozen, but before that it went through multiple rounds of interaction with the world.