Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It doesn't really matter whether you do this experiment with two training sets created independently or one training set split in half. As long as both are representative of the underlying population, you would get roughly the same results. In the case of human faces, as long as the faces are drawn from roughly similar population distributions (age, race, sex), you'll get similar results. There's only so much variation in human faces.

If the populations are different, then you'll just get two models that have representations of the two different populations. For example, if you trained a model on a sample of all old people and separately on a sample of all young people, obviously those would not be expected to converge, because they're not drawing from the same population.

But that experiment of splitting one training set in half does tell you something: the model is building some sort of representation of the underlying distribution, not just overfitting and spitting out chunks of copy-pasted faces stitched together.



That's explanation of central limit theorem in statistics. And any language is mostly statistics and models are good at statistical guessing of the next word or token.


If not are sampled from the same population then they’re not really independent, even if they’re totally disjoint.


They are sourced mostly from the same population and crawled from everything can be crawled.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: