I used to scrape web for a daily deals search engine i wrote for a client in 2010. But we scraped in realtime as the number of sites were really low (in 10s).
pre-crawled copies with distributed processing platform could be cool. you could come up with a better search engine with programmable rules that are edited collaboratively (like wikipedia)
pre-crawled copies with distributed processing platform could be cool. you could come up with a better search engine with programmable rules that are edited collaboratively (like wikipedia)