> But like I can see your side of the argument, you have to be able to see that ...

wpietri · on July 31, 2015

> Apple, Google, MS and others could deliver the same products [...] without sacrificing privacy [..]

Could they? My amateur understanding is that a lot of today's success in machine learning is due mainly to having enormous amounts of data to work with.

When I look at Google Now, for example, I can't imagine a way to build it without collecting an ocean of detailed personal data. Or your example of finding common behaviors and having computers do the right thing: that gets much, much easier if you have the daily behavior data of 10m people so you can start extracting concepts like "typical morning routine", testing recognizers for that, and having them not do anything in low-confidence situations.

richmarr · on July 31, 2015

It's not just about advertising. By looking at customer data in aggregate you can learn more about behaviour patterns and support the things your customers might be interested in doing. Buying products is one of the things you might be interested in doing.

That said, this whole thing gives me the creeps and I'm glad I'm no longer a Microsoftie.

restalis · on July 31, 2015

"By looking at customer data in aggregate you can learn more about behaviour patterns and support the things your customers might be interested in doing."

That could be done locally, without sharing the private data. The local computing agent can then look up in the public (like the pool of those who deliberately published content for all to see) for information that may be of interest to the user. That would have been a moral solution to please everyone. What we see happening now is a nightmare!

richmarr · on Aug 1, 2015

> That could be done locally

It could be done locally, but in order to not share any data with the server you'd need to run the analysis (with all of the associated data) on the local machine, which unless I'm missing something would add some non-trivial constraints, e.g.

- Getting research-grade analysis code up to local-install quality levels, keeping that code updated

- Bandwidth and HDD space for large datasets

- The additional load on the CPU, memory, battery, and messaging that to the customer

- The legal and privacy implication of all that opt-in data being transferred and processed on thousands of opt-out customers' machines

- The need to have an entirely duplicated system because some people would rather opt-in and not have to run all this stuff run on their already-creaking-under-the-weight-of-windows-and-outlook-and-word-and-antivirus laptop

Maybe I'm misunderstanding something, but from this view I can understand why they didn't want to do it this way

johnchristopher · on July 31, 2015

Frankly, I don't buy it when it's on Google, FB or MS scale. Their incentives is to maximize profits, not the user's happiness, wants or needs. Sometimes the later might help optimize the first but it's not the objective behind all the scraping.

Those improvements could be done with much less intruding anyway (be it for the sake of it or because johnny hacker is going to release those data someday).

richmarr · on Aug 1, 2015

> Their incentives is to maximize profits, not the user's happiness, wants or needs

Sure, although user happiness (broadly) drives market share so they need to maximise that to maximise profit.