All of the Internet does not include everything you can extrapolate from it. Whe...

freeone3000 · 2025-02-06T02:16:46 1738808206

This sort of gets at the crux of it: aren’t you? You’re asking for the most probable example of a sequence of tokens in a formalized language (code or english) to communicate the idea, where the entire language, its rules, and many examples are in the training set. And the probability of your question, or a question much like it, having been asked is quite high actually. Performance can be seen to tank drastically when you ask for APL or MUMPS code instead of Javascript or Python; it produces Ol Chiki at significantly less proficiency than English. Does this mean these models are drastically overfit to English and Python? And if so, so what?

esafak · 2025-02-06T04:09:49 1738814989

The fact that the model learns to generate samples from a distribution, be it a natural language like English, or a computer language like Python, does not mean it is overfitting. The fact that it can't generate good APL, MUMPS, or whatever means that it does not have enough data to learn its distribution. Knowing so much about some languages like Python is not a hindrance, but likely a boon, because it teaches the model what programming languages look like.

If it was smarter, it could figure things out just from the language specification, but we're not there yet.