This paper has a very concise and easier-to-understand definition of Google's Ma...

krackers · on July 1, 2024

Isn't the GROUP BY run before the SELECT though, e.g. "SELECT MAX(t) FROM foo GROUP BY t"? I think to do it the way they suggest you'd probably need to create a temp table like

WITH mapped as SELECT map() from crawl_table SELECT * FROM mapped GROUP BY reduce()

Sesse__ · on July 1, 2024

Yes. MapReduce's model is basically:

  1. Map (key, value) -> (new_key, tmp_value)
  2. Group by new_key
  3. Reduce (new_key, all tmp_values for that key) -> (new_key, new_values)

In that respect, it's not that far from SQL with custom aggregates. I guess the most precise SQL representation would be

  SELECT REDUCE(MAP(t)) FROM foo GROUP BY KEY(MAP(t))

(I've both been on the MapReduce team, and worked on an SQL database. I don't honestly think they're that comparable.)