Nice, what are you using to crawl the web?

marginalia_nu · on Sept 16, 2021

It's pretty much all bespoke.

I use external libraries for parsing HTML (JSoup) and robots.txt; but that's about it.

soheil · on Sept 17, 2021

What was the starting site you fed to the crawler to follow the links from to build the index?

marginalia_nu · on Sept 17, 2021

Just my (swedish) personal website. The first iteration of the search engine was probably mainly seeded by these links:

https://www.marginalia.nu/00-l%C3%A4nkar/

But I've since expanded my websites, so now I think these play a decent role in later iterations, although they are virtually all of them pages I've found eating my own dogfood:

https://memex.marginalia.nu/links/fragments-old-web.gmi

https://memex.marginalia.nu/links/bookmarks.gmi