Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Did you really crawl Google? That has to be a long time ago. But speaking about searching on Google as a user:

Google's Advanced search used to be a great tool, until around 2007/08. For some reason it never received an upgrade and several things are broken or don't work any more or were removed (e.g. '+' which is now a keyword for Google+, the '"' does mean the same; e.g some filetypes are blocked, some show only a few results).



Google never had an advanced search that was as useful as altavista where I regularly (daily!) searched for things like:

("term1" and "term2") not ("term3" or "term4")

and whatever tiny "power user" features that google had, like "allinsite:term term term" or '+' don't seem to work at all now.

Google is not optimized for finding things. Google is optimized for ad views and clicks.


so you mean like "term1" "term2" -"term3" -"term4"? or if I wanted to do this without returning results from hackernews "term1" ... -"term4" -site:news.ycombinator.com ?

I don't see how altavista is superior here


The problem is "whatever tiny 'power user' features that google had... don't seem to work at all now."

I think I know what they were talking about. A lot of times it appears that adding advanced terms to a query will change the estimated number of results yet all the top hits will be exactly the same. Also, punctuation seems to be largely ignored, e.g. searching "etc apt sources list" and "/etc/apt/sources.list" both give me the exact same results. Putting the filename in quotes also gives the same results as before.

Searching for specific error messages with more than a few key words or a filename is usually a nightmare.


This is all true. I do wish there were a flag you could set like searchp:"/etc/apt/sources.list"


The original Google has gone forever.

But then, so has the www that the original Google worked so well for.

I suspect original Google would be horrible on today's web.

I miss 1998 and I mourn for what could have been.


Is there any truth to my suspicion that the web of hyperlinks (on which the famed algorithm relied) is significantly weaker and reaches fewer corners these days?

Certainly feels like content is migrating to the walled gardens and there are fewer and fewer personal websites injecting edges into the open graph.


Last November I speculated why Google would let HTTP/2 get standardized without specifying the use of SRV records:

This is going to bite them big time in the end, because Google got large by indexing the Geocities-style web, where everybody did have their own web page on a very distributed set of web hosts. What Google is doing is only contributing to the centralization of the Web, the conversion of the Web into Facebook, which will, in turn, kill Google, since they then will have nothing to index.

They sort of saw this coming, but their idea of a fix was Google+ – trying to make sure that they were the ones on top. I think they are still hoping for this, which is why they won’t allow a decentralized web by using SRV records in HTTP/2.

https://news.ycombinator.com/item?id=8550133


It feels the same way to me. To a large percent of users the internet is Facebook rather than the largest compendium of human knowledge in existence, but lucky for us that use it for the latter reason that value of such a thing will always be evident.


Come to think of it, we moved offices 3 times since then, must've been 8-10 years ago. I don't think I had to do any special trickery, I spend only an afternoon or so writing and testing the code. I didn't realize such a thing would be impossible now - what a shame. I downloaded several gigabytes iirc - a big amount at the time.


Though now a day you could use Common Crawl to get the dataset and use existing tools to extract such files, right? (I've no idea if that's a practical thing to do or not.)


I guess so, if they "look" at the web the same way Google does (respecting robots.txt, nofollow etc - which Wikipedia says they do). But the interesting things are found in nooks and crannies where nobody else has thought of looking before - so relying on someone else to do the heavy lifting is probably the wrong way to go about it...


Common crawl gives you the data, not the results for the keywords that you're interested in.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: