> But like I can see your side of the argument, you have to be able to see that some other people want personalization and learning and all that. Pandora and Apple Music are both heavily tailored that way. Google Now on your phone knows everything you do. Netflix can find videos for you to watch based on what you've watched before. Amazon will recommend purchases to you based on what you like. Hell, half the people on this site build these systems. You know how many machine learning articles there are on the front page every week?
But that's the thing, right ? People want their computers to be more intelligent, reactive, adapted to their needs. They don't want Google, MS or Apple to know everything about them. How did the first came to automatically imply the second ?
Apple, Google, MS and others could deliver the same products (software that learn user behaviour and adapt accordingly) without sacrificing privacy, invading personal space and storing private documents on the cloud in order to parse it to deliver relevant ads.
Machine learning should keep on trying to be machine learning and not solely data scraping for marketing tuning and exploitation.
What does it bring me that MS or Google knows my search terms of the day ? I want my quad-core CPU to know that when I browse HN it should automatically split the screen in half and open my media player to listen to radio music because that's what I do most morning. Why do I have to do that by hand ? Can't it know or guess my routine by now ?
Or is all the tech just a glorified lexical parser to fine tune ads to increase their efficiency ?
> Apple, Google, MS and others could deliver the same products [...] without sacrificing privacy [..]
Could they? My amateur understanding is that a lot of today's success in machine learning is due mainly to having enormous amounts of data to work with.
When I look at Google Now, for example, I can't imagine a way to build it without collecting an ocean of detailed personal data. Or your example of finding common behaviors and having computers do the right thing: that gets much, much easier if you have the daily behavior data of 10m people so you can start extracting concepts like "typical morning routine", testing recognizers for that, and having them not do anything in low-confidence situations.
It's not just about advertising. By looking at customer data in aggregate you can learn more about behaviour patterns and support the things your customers might be interested in doing. Buying products is one of the things you might be interested in doing.
That said, this whole thing gives me the creeps and I'm glad I'm no longer a Microsoftie.
"By looking at customer data in aggregate you can learn more about behaviour patterns and support the things your customers might be interested in doing."
That could be done locally, without sharing the private data. The local computing agent can then look up in the public (like the pool of those who deliberately published content for all to see) for information that may be of interest to the user. That would have been a moral solution to please everyone. What we see happening now is a nightmare!
It could be done locally, but in order to not share any data with the server you'd need to run the analysis (with all of the associated data) on the local machine, which unless I'm missing something would add some non-trivial constraints, e.g.
- Getting research-grade analysis code up to local-install quality levels, keeping that code updated
- Bandwidth and HDD space for large datasets
- The additional load on the CPU, memory, battery, and messaging that to the customer
- The legal and privacy implication of all that opt-in data being transferred and processed on thousands of opt-out customers' machines
- The need to have an entirely duplicated system because some people would rather opt-in and not have to run all this stuff run on their already-creaking-under-the-weight-of-windows-and-outlook-and-word-and-antivirus laptop
Maybe I'm misunderstanding something, but from this view I can understand why they didn't want to do it this way
Frankly, I don't buy it when it's on Google, FB or MS scale. Their incentives is to maximize profits, not the user's happiness, wants or needs. Sometimes the later might help optimize the first but it's not the objective behind all the scraping.
Those improvements could be done with much less intruding anyway (be it for the sake of it or because johnny hacker is going to release those data someday).
But that's the thing, right ? People want their computers to be more intelligent, reactive, adapted to their needs. They don't want Google, MS or Apple to know everything about them. How did the first came to automatically imply the second ?
Apple, Google, MS and others could deliver the same products (software that learn user behaviour and adapt accordingly) without sacrificing privacy, invading personal space and storing private documents on the cloud in order to parse it to deliver relevant ads.
Machine learning should keep on trying to be machine learning and not solely data scraping for marketing tuning and exploitation.
What does it bring me that MS or Google knows my search terms of the day ? I want my quad-core CPU to know that when I browse HN it should automatically split the screen in half and open my media player to listen to radio music because that's what I do most morning. Why do I have to do that by hand ? Can't it know or guess my routine by now ?
Or is all the tech just a glorified lexical parser to fine tune ads to increase their efficiency ?