Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Custom Search Engine Built on Searx (jpreston.xyz)
79 points by wcerfgba on Sept 13, 2021 | hide | past | favorite | 17 comments


I'm interested in something similar. What I've tried is to set up https://yacy.net/ to index only those websites I care about. I set it up on my home NAS and I configured it to crawl Python, sklearn, and some blogs and it took around 1 hour.

Searching through the results was quite fast, but I found the results a bit lacking. Maybe with more tuning I could have obtained better answers, but while searching for things in the Python standard library, I would get lots of noise from other places where that function is used.


Yes I was originally considering crawling and indexing on my local machine as well but once I realised I could build this with Searx and leave all of that work to Bing/Google/DDG/... it removed a lot of complexity from the project, and also means I don't have to worry about optimising the quality of the results. That said, there is scope to do more work around plugging all the result sets together: Searx's approach is quite naive and doesn't work as well in this use case as it does in it's original/intended use case.


YaCy does full text search and indexes all pages.

edit: But it works best for just a single set of websites. I wouldn't know how to build multiple indexes.


I am looking for a real custom search engine for months now. I tried google programmable search engine, bing custom search and was even building scripts for duckduckgo, but nothing really worked for me.

For example, I have a list of 7500 company websites and I want to search them regularly for keywords, like product names, company names mentioned on them or if they have a some references to certain industries.

This could be a solution.


Not exactly the same but Syften (https://syften.com/) can do this kind of 'listening' for social platforms like HN, Reddit, ... . Possibly you could reach out to Michal about extending their ingest with arbitrary/custom websites?


I will do so, thanks for the hint.


I'm happy to work with you on this. My email is michal@syften.com.


FWIW, there is a fork of searx which has seen a lot of activity recently.

https://github.com/searxng/searxng


I have made an attempt to this, but far from complete. I manually indexed the search engines of different websites in a common natural language interface: When writing "date parser MIT license stars > 1000" It automatically suggests GitHub as a website that can support this query:

https://quantleaf.com/?q=&t=query&run=false

You can see more examples on the website.


On mobile your site shows me that GitHub can support that query, but I’d like to be able to then tap on it and be taken to the GitHub search in question. At least on iOS with Safari, I could not find a way to be taken to the GitHub results by tapping on anything.


When the results appear, you have to press the small play button at the right side, then you will be navigated to GitHub to see the expected search results.

I tried it now on iOS with Safari. I tried "date parser mit license". It could be more intuitive though


Wow, this is exactly the way we have been thinking about Neera. Checkout https://hargup.substack.com/p/so-convenient-that-it-changes-...


It would be fantastic if this sort of functionality was upstreamed. I think it would really appeal to the audience that uses/hosts Searx.


I did open an issue on GitHub to gauge interest in upstreaming it :) If you like it, leave a thumbs up :) https://github.com/searx/searx/issues/2952


I've seen this blog on here a couple of times now. Does anyone know what it's built on (or if it's using a specific theme)?


I'm using GitHub Pages and Jekyll. I wrote the theme from almost-scratch, I think I started with a basic theme and then ripped out everything I didn't like, and then started building up again.

Due to limitations on available Jekyll plugins on GH Pages, I wrote some particularly spicy Liquid templating to handle my automatic backlinks and site tree construction: https://github.com/wcerfgba/jpreston.xyz/blob/master/_includ...


Thanks so much for the insight!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: