> In addition to following all robots.txt rules and directives, Apple has a secondary user agent, Applebot-Extended, that gives web publishers additional controls over how their website content can be used by Apple.
> With Applebot-Extended, web publishers can choose to opt out of their website content being used to train Apple’s foundation models powering generative AI features across Apple products, including Apple Intelligence, Services, and Developer Tools.
But it also says that Applebot-Extended doesn't crawl webpages and instead this marker is only used to determine what can be done with the pages that were visited by Applebot.
Not that I like an opt-out system, but based on the wording of the docs it is true that if you blocked Applebot then blocking Applebot-Extended isn't necessary.
Yeah that is true, but I suspect that most publishers that want their content to appear in search but not used for model training will not have blocked Applebot to date (hence the original commenter's argument)
It's still true that there's no Applebot-Extended if it isn't crawling pages. Rather it's a marker to ask Applebot to limit what it does with your pages.
Isn't it still true that if people wanted to have their website show up in search in the past (so they didn't block Applebot), then it's too late to mark it as "no training" now, since it's already been scraped?
I guess it can be useful for data published in the future.
> Controlling data usage
> In addition to following all robots.txt rules and directives, Apple has a secondary user agent, Applebot-Extended, that gives web publishers additional controls over how their website content can be used by Apple.
> With Applebot-Extended, web publishers can choose to opt out of their website content being used to train Apple’s foundation models powering generative AI features across Apple products, including Apple Intelligence, Services, and Developer Tools.