Self-hosted and scalable search engine that indexes every character (not just wo...

Jemaclus · on Jan 14, 2021

What's the upside of indexing every character?

emidoots · on Jan 14, 2021

1. I represent the structured documents in the index as just a text file with a special format (think JSON, but designed specifically to be queried with only regex.) This lets me execute those complex regex queries directly against a trigram text index and often have substantial portions of the query (the parts reducable down to trigrams) be indexed.

1. A side-effect of the above: You get to query every character, which can be useful in some contexts such as e.g. searching over config files, code, for grammatical errors in text documents, etc.