Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
This Psychologist Might Outsmart the Math Brains Competing for the Netflix Prize (wired.com)
70 points by elq on Feb 28, 2008 | hide | past | favorite | 24 comments


I've always known that science journalists don't know anything about science. I'm only now realizing that they don't know anything about journalism either.

They only know how to tell one story. That's the story of the underdog scientist who is (or more commonly, "might be about to") upsetting the stuffy old establishment. Everything that goes on in the scientific community has to be shoehorned into the storyline of "Dodgeball" or "The Mighty Ducks", or else they have no idea how to talk about it.

I was originally going to post this comment on the thread about the "surfer dude with theory of everything" article, but it fits here just as well.


Sure, there's plenty of fluff in the article, but it isn't because the author (http://www.math.wisc.edu/~ellenber/) is clueless about science. The article is probably at the right level of technical depth. Seems ambitious to even touch SVD in Wired.


Oops. I stand corrected. The author of this particular article obviously does know the subject very well. My apologies.


I'm willing to bet every dollar to my name that the author can eat you for breakfast when it comes to "knowing about science".


Umm...the author is a math professor at the University of Wisconsin, and he's a published author (in fiction).

Jordan Ellenberg (ellenbergwired@gmail.com) is a math professor at the University of Wisconsin and author of the novel The Grasshopper King.


...yes...that's what I was pointing out...


So true. It's like they got their ethos and values from Disney movies.


Whoops! OK, so just to clarify, I didn't mean for my off-hand generalization about "science journalists not knowing anything about science" to extend to the author of this article in particular, who as several people have now pointed out, turns out to actually be a professor of mathematics at Madison.


This article needed a lot more info about the psychologist's approach to make it interesting, it was 90% background and the barest hint at the end about what he is actually doing. For all we know he may have just made minor tweaks to some existing algorithm.


The article mentions tweaking the algorithm to take the timing of ratings into account. It gives the example that a person might rate two movies as 3/5, and a third they watch right after as a 4/5, because it was better than the previous two, but under normal(ized) circumstances that user would've given the 3rd movie a 3/5 too.

Soo.... it's an interesting notion, that there can be time-segment-based normalization of the data set. Team Bellkor/KorBell credit a big part of their gains to using the ordering of rankings, rather than the rankings themselves to test for similarity, so this guy's actually got a novel approach for normalizing the dataset. I really don't know how he can detect if he's dealing with the kind of person that is meticulous enough to make sure their new ratings take all their previous ratings into account, though, or if he's dealing with the kind of person that gives everything a rating from 4/5 to 5/5.. I wouldn't be surprised if he hits a wall because of this.

Really I think netflix would make a lot better strides towards their goals by improving their data collection technique. They could probably make great enhancements towards normalization just by saying "you gave this previous movie 3/5 stars. How do you rate this latest movie?" That way the data would be much more normalized. There are lots of possibilities for this sort of enhancement, so I'm not sure why they're only letting competitors look at the already collected dataset.


He mentions that he is inspired by Daniel Kahnemann, who has done work in behavioral economics. He won a nobel prize in 2002 that can be seen here: http://nobelprize.org/nobel_prizes/economics/laureates/2002/...

The lecture explains quite well what he does, and also gives a hint at what is being done in the netflix prize contest.


Exactly. I mean, who cares about what Netflix' headquarters look like? Or what university his daughter goes to? Sometimes it feel like these journalists are just filling in a template.


Why does he or the author consider him a psychologist? I think its a bit misleading when simply reading the title of the article. He has an undergraduate degree in psychology, but his master's is in operations research (and correct me if I'm wrong, but OR is usually the application of statistics to business problems, right?). Wouldn't the article more accurately be something like "Operations research guy uses statistical technique motivated by psychology ... "


It was a great article, but he hasn't really been anonymous in the Netflix Prize world:

http://www.netflixprize.com/community/viewtopic.php?pid=6090...

Anyway, congrats Gavin...


"Just a guy in a garage" is now at #8. "When Gravity and Dinosaurs Unite" tops the leaderboard ( http://www.netflixprize.com/leaderboard ), above BellKor (the progress prize winner of 2007).


Yet another psychological problem mathematicians too eagerly claimed as their own. Why not try to find patterns about the movies which appeal to a certain individual? Is it against the rules to use outside data (actors, directors, etc)?


From the Netflix prize FAQ:

Why not provide other data about the movies, like genres, directors, or actors?

We know others do. Again, Cinematch doesn’t currently use any of this data. Use it if you want.

That seems like an easy target - if I've 5-starred every movie with Kevin Spacey, I probably will like anything with Kevin Spacey. Why not mine blogs and reviews, trying to find themes which are appealing to individuals?


It's probably just not that hot of a solution, believe it or not. The recommendation engine should be able to make much better associations between movies, without even being able to describe what those associations are. Looking at features like genre, director, actor etc, is like a spam filter looking for specific spammy words. As soon as the spammers start saying "p3n1s" instead or Eddy Murphy starts making family movies, it falls apart. Check out http://karmatics.com/docs/evolution-and-wisdom-of-crowds.htm... (scroll down to "3. Recommendation Systems" for the relevant part) for an explanation.


Great link. Thanks for posting that.


I think the limiting factor is the amount of computation required. When you take the number of variables (actors, directors, themes, box office gross, etc) x the number of values each of them have (dozens to thousands) x the number of users, that's a lot of crunching to do. Perhaps do some pre-crunching to find actors that have a correlation like that and then just check those. Apply the "Kevin Spacey index", if you will.


Kills me how fast WIRED postings make it in. We start discussing articles before I even get to read them in the real magazine :(


Wired and Reddit are part of the same media conglomerate, so that explains how the wired articles get on Reddit. I think a lot of n.yc readers read reddit, so that is how they get here so fast I bet.


Now there just needs to be an economist to drop in and sneak an entry with a 9% improvement.

Statistics and math are excellent tools, but they're quite blind.


well... there is probably already a few teams with econometricians on board. based on my prior experience with that ilk - I'm sure their progress got held up by the insistence that all returns (err, ratings) follow a guassian distribution.

Also, all of that training in math and stats is pretty useless if you just use it to "prove" something you already "know" :)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: