Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This paper has a very concise and easier-to-understand definition of Google's Mapreduce:

> To a first approximation, MR runs a single query:

> SELECT map() FROM crawl_table GROUP BY reduce()

Or you could read the entire Google Mapreduce paper



Isn't the GROUP BY run before the SELECT though, e.g. "SELECT MAX(t) FROM foo GROUP BY t"? I think to do it the way they suggest you'd probably need to create a temp table like

WITH mapped as SELECT map() from crawl_table SELECT * FROM mapped GROUP BY reduce()


Yes. MapReduce's model is basically:

  1. Map (key, value) -> (new_key, tmp_value)
  2. Group by new_key
  3. Reduce (new_key, all tmp_values for that key) -> (new_key, new_values)
In that respect, it's not that far from SQL with custom aggregates. I guess the most precise SQL representation would be

  SELECT REDUCE(MAP(t)) FROM foo GROUP BY KEY(MAP(t))
(I've both been on the MapReduce team, and worked on an SQL database. I don't honestly think they're that comparable.)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: