Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Public content is still subject to copyright, and I doubt that AppleBot only scrapes content carrying a suitable license. And "fair use" (which is unclear if it applies), in case you want to invoke it, is a notion limited to the US and only a handful of other countries.


All you have to do is drop a token swear word into your content and they remove it from the dataset. Easy.


Why would they? From the moderate of testing I've done of their handwriting recognition on an ipad, they seem to have everything risqué/offensive I could think of in there, even if you have to write it more clearly than other words. I don't expect this to be much different, other than a word filter on the output.


I mean for their large language model training. They said they don't include low quality data and swearing. This means you can get out of it by swearing.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: