Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This is a similar problem to what was observed in Diffusion models going "MAD" when trained on synthetic data. https://arxiv.org/abs/2307.01850 . Therefore, going forward AI companies will find it increasingly difficult to get their data by scraping the. web, because web will be full of synthetically generated data.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: