Unfortunately, in ML "public data" typically means available to the public. Even... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		6gvONxR4sf7o on June 29, 2021 \| parent \| context \| favorite \| on: GitHub Copilot Unfortunately, in ML "public data" typically means available to the public. Even if it's pirated, like much of the data available in the Books3 dataset, which is a big part of some other very prominent datasets.

kzrdude on June 29, 2021 | [–]

So basically youtube all over again? I.e bootstrap and become popular by using widely available whatever media (pirated by crowdsourced piracy) and then many years later, when it gets popular, dominant, it has to turn around and "do things right" and guard copyrights.

Consider applying for YC's Summer 2026 batch! Applications are open till May 4
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact