> I guess blogs that are linked-to in non-killed HN comments should probably be crawled a bit
They are, but there are relatively a few of them because my only page content source is the Common Crawl. The hit rate vs the total urls I'm interested in is not great. I expect to fix this soon.
I'm also not indexing entire sites, only specific upvoted urls. This will change as well.
> Have you considered using social user karma (this could be a 1-10 score uniquely calculated for users of each of HN, Twitter, Reddit as long as it's built in a modular way) as a weight in a PageRank style schema?
Definitely. I've already started in on calculating a rank coefficient for submitters, but it's not completely clear now to best use it yet.
> Here's how I am going to evaluate your search engine
Feel free to dump more of these. Some solid test cases would be very helpful.
They are, but there are relatively a few of them because my only page content source is the Common Crawl. The hit rate vs the total urls I'm interested in is not great. I expect to fix this soon.
I'm also not indexing entire sites, only specific upvoted urls. This will change as well.
> Have you considered using social user karma (this could be a 1-10 score uniquely calculated for users of each of HN, Twitter, Reddit as long as it's built in a modular way) as a weight in a PageRank style schema?
Definitely. I've already started in on calculating a rank coefficient for submitters, but it's not completely clear now to best use it yet.
> Here's how I am going to evaluate your search engine
Feel free to dump more of these. Some solid test cases would be very helpful.