I'm interested in something similar. What I've tried is to set up https://yacy.net/ to index only those websites I care about. I set it up on my home NAS and I configured it to crawl Python, sklearn, and some blogs and it took around 1 hour.
Searching through the results was quite fast, but I found the results a bit lacking. Maybe with more tuning I could have obtained better answers, but while searching for things in the Python standard library, I would get lots of noise from other places where that function is used.
Yes I was originally considering crawling and indexing on my local machine as well but once I realised I could build this with Searx and leave all of that work to Bing/Google/DDG/... it removed a lot of complexity from the project, and also means I don't have to worry about optimising the quality of the results. That said, there is scope to do more work around plugging all the result sets together: Searx's approach is quite naive and doesn't work as well in this use case as it does in it's original/intended use case.
I am looking for a real custom search engine for months now. I tried google programmable search engine, bing custom search and was even building scripts for duckduckgo, but nothing really worked for me.
For example, I have a list of 7500 company websites and I want to search them regularly for keywords, like product names, company names mentioned on them or if they have a some references to certain industries.
Not exactly the same but Syften (https://syften.com/) can do this kind of 'listening' for social platforms like HN, Reddit, ... . Possibly you could reach out to Michal about extending their ingest with arbitrary/custom websites?
I have made an attempt to this, but far from complete.
I manually indexed the search engines of different websites in a common natural language interface:
When writing
"date parser MIT license stars > 1000"
It automatically suggests GitHub as a website that can support this query:
On mobile your site shows me that GitHub can support that query, but I’d like to be able to then tap on it and be taken to the GitHub search in question. At least on iOS with Safari, I could not find a way to be taken to the GitHub results by tapping on anything.
When the results appear, you have to press the small play button at the right side, then you will be navigated to GitHub to see the expected search results.
I tried it now on iOS with Safari. I tried "date parser mit license". It could be more intuitive though
I'm using GitHub Pages and Jekyll. I wrote the theme from almost-scratch, I think I started with a basic theme and then ripped out everything I didn't like, and then started building up again.
Searching through the results was quite fast, but I found the results a bit lacking. Maybe with more tuning I could have obtained better answers, but while searching for things in the Python standard library, I would get lots of noise from other places where that function is used.