Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

We may enter a vicious loop where writing is increasingly generated by LLMs. Then, LLMs have to train on their own output leading to model collapse.

Hence, the models depend on human writing.



This intuitively makes sense (like deep-frying a JPEG), but it doesn't seem to happen in practice, as modern models are frequently trained on text both output from other models, and curated from other models.

Realistically, going forward model training will just need to incorporate a step to remove data below some quality threshold, LLM-generated or otherwise.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: