Self-hosted and scalable search engine that indexes every character (not just words) and offers a complex query language (regex, document metadata filtering, etc.)
Core idea is that you can chuck unstructured documents (JSON, Protobuf messages, etc.) into it and perform rudimentary querying/filtering/correlation based on arbitrary fields and values in those documents, without writing a proper DB schema / answering "what will my queries look like?" ahead of time (as you might with Postgres jsonb support for example) and still have substantial portions of your query be indexed.
Example use case: chuck Hacker News comments, Stackoverflow comments, and GitHub issues into it. Then you can search over those and correlate arbitrary metadata across them, e.g. search for some regex that across github issues only where it is referenced elsewhere on HN/SO.
Primarily interesting for exploring/querying across structured documents in complex ways that you can't anticipate before-hand.
1. I represent the structured documents in the index as just a text file with a special format (think JSON, but designed specifically to be queried with only regex.) This lets me execute those complex regex queries directly against a trigram text index and often have substantial portions of the query (the parts reducable down to trigrams) be indexed.
1. A side-effect of the above: You get to query every character, which can be useful in some contexts such as e.g. searching over config files, code, for grammatical errors in text documents, etc.
Core idea is that you can chuck unstructured documents (JSON, Protobuf messages, etc.) into it and perform rudimentary querying/filtering/correlation based on arbitrary fields and values in those documents, without writing a proper DB schema / answering "what will my queries look like?" ahead of time (as you might with Postgres jsonb support for example) and still have substantial portions of your query be indexed.
Example use case: chuck Hacker News comments, Stackoverflow comments, and GitHub issues into it. Then you can search over those and correlate arbitrary metadata across them, e.g. search for some regex that across github issues only where it is referenced elsewhere on HN/SO.
Primarily interesting for exploring/querying across structured documents in complex ways that you can't anticipate before-hand.