*> I don't even know why I should care. If MS crawl some web pages I've written ...

nvm0n2 · on Sept 1, 2023

I do use MS services. I still don't understand why I should care unless the AI starts simply repeating my private data in response to questions.

Now you could argue, what if I have documents with secret ideas or valuable IP that I don't want the AI to helpfully explain to others? That's definitely a valid concern! But for consumer uses, if it learns to draw better hands by looking at my holiday photos or whatever, then I don't see the problem.

dspillett · on Sept 1, 2023

> unless the AI starts simply repeating my private data in response to questions

That is a concern some have, particularly around CoPilot and the fact it has been trained with much copy-left covered code in public repositories.

They assure us that it is not possible for blocks of code to be regurgitated that would break things like *GPL, but they have yet to explain why, if that assurance is 100% definitely true, they have not included any of their private code in the training set. Surely they consider that their code is of good quality and would be valuable to include in the model.

> if it learns to draw better hands by looking at my holiday photos or whatever, then I don't see the problem

And if it gives an advertising firm working for a product you'd rather not be associated with an image of a family that look _very_ like yours? Again, the same assurance is given as per CoPilot, but again not everyone is assured by the assurance.

And of course it could happen anyway by chance even if your family is not in the training set. I don't not bother to lock my doors because someone with a good lock-pick could get in anyway.

And they are not doing it because of a great communal benefit (well, their individual coders may be, but the company certainly isn't), they are doing it for commercial benefit. I'd prefer they didn't with my data, or if they do I'd like my slice however small thankyouverymuch.*

mowthie · on Sept 1, 2023

> If you don't use any MS products or services, and no products/services you do use are backed by MS's services, then you don't need to care personally.

I beg to differ, wouldn't they be more inclined to care in case their data was being used in a product they do not interact with, rather than the one they do use - and in some way benefit from it?

dspillett · on Sept 1, 2023

That is a huge grey area of indirect use/agreement. If they don't interact with those services than someone else has given MS the data so from MS's PoV someone else has agreed to the policy and from the users PoV someone else has perhaps given their data to MS without permission. So yes, a concern, but not necessarily one relating to this policy except any clauses it has about removing data and its use when they are informed they shouldn't have it.