Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> I don't even know why I should care. If MS crawl some web pages I've written and AI gets slightly smarter by reading them

Crawling public web pages is a separate issue⁰ – by putting something online you aren't explicitly agreeing to any of MS's policies, at least in the eyes of the law. This is the same for anyone crawling public content not just MS.

This privacy policy covers all the content you might use MS apps and services for, i.e. where you are¹ automatically agreeing to MS's policies: OneDrive, potentially any local-only documents in Office, code in VS and other tools, perhaps anything stored on your PC running Windows.

> I don't even know why I should care.

If you don't use any MS products or services, and no products/services you do use are backed by MS's services, then you don't need to care personally. Or indeed if you do but consider everything you output or otherwise work on to be public domain. Otherwise, maybe it is something you should form an opinion on?

----

[0] time to switch my robots.txt files to “User-agent: * Disallow: /” – though it is very likely already too late for any existing content

[1] except where limited by law that you can afford to argue with MS's legal team over



I do use MS services. I still don't understand why I should care unless the AI starts simply repeating my private data in response to questions.

Now you could argue, what if I have documents with secret ideas or valuable IP that I don't want the AI to helpfully explain to others? That's definitely a valid concern! But for consumer uses, if it learns to draw better hands by looking at my holiday photos or whatever, then I don't see the problem.


> unless the AI starts simply repeating my private data in response to questions

That is a concern some have, particularly around CoPilot and the fact it has been trained with much copy-left covered code in public repositories.

They assure us that it is not possible for blocks of code to be regurgitated that would break things like *GPL, but they have yet to explain why, if that assurance is 100% definitely true, they have not included any of their private code in the training set. Surely they consider that their code is of good quality and would be valuable to include in the model.

> if it learns to draw better hands by looking at my holiday photos or whatever, then I don't see the problem

And if it gives an advertising firm working for a product you'd rather not be associated with an image of a family that look _very_ like yours? Again, the same assurance is given as per CoPilot, but again not everyone is assured by the assurance.

And of course it could happen anyway by chance even if your family is not in the training set. I don't not bother to lock my doors because someone with a good lock-pick could get in anyway.

And they are not doing it because of a great communal benefit (well, their individual coders may be, but the company certainly isn't), they are doing it for commercial benefit. I'd prefer they didn't with my data, or if they do I'd like my slice however small thankyouverymuch.*


> If you don't use any MS products or services, and no products/services you do use are backed by MS's services, then you don't need to care personally.

I beg to differ, wouldn't they be more inclined to care in case their data was being used in a product they do not interact with, rather than the one they do use - and in some way benefit from it?


That is a huge grey area of indirect use/agreement. If they don't interact with those services than someone else has given MS the data so from MS's PoV someone else has agreed to the policy and from the users PoV someone else has perhaps given their data to MS without permission. So yes, a concern, but not necessarily one relating to this policy except any clauses it has about removing data and its use when they are informed they shouldn't have it.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: