I would suggest trying to get this associated with a community-driven open source foundation like Apache. I think you will struggle to convince developers or enterprises to use a data storage + analytics platform developed and maintained by GCHQ.
Does it meaningfully deal with binary data, that is can extract then encode, decide them, handle parts with corrupted data, do analysis on them etc? What about images and other large 2d data?
It supports arbitrary data formats, including binary data. It sends corrupted data or data that fails to parse to an 'error' stream where you can complete further processing on it. I don't think Stroom is right for you if you're processing data like images and other large 2d data sets. It is primarily made for data that can be transformed to XML.
Are you actually affiliated or is this just a cute coincidence? Hard to tell from your message history, but I assume even if you are you would not pick a username like that, lol.
I feel maybe the best comparison might be to Elasticsearch? It takes in mostly log data, parses it (ala Ingest Nodes) and then makes it searchable / shown as dashboards.
GCHQ (straight-faced): Just download and run this shell script we wrote. No funny business, we promise.
NSA: snickers
I think there's space for a person, a company or a tool to certify all the scripts we run by passing the results of a curl directly into a shell. Wonder if there's any money in it.
What's significant about the shell script and curl? The entire repository is software that they wrote that you can chose to run or not. Seems pretty straight up and clear to me. Running it through curl and bash or downloading it doesn't make any material difference.
Was not being snide. Significance is that these installation scripts tend to be managed separately from the application code and that there are more avenues for attack via these scripts -- it is not usually apparent where the scripts are coming from.
Running them directly post-curl without even verifying a sum of some sort leaves me uncomfortable.
Besides it's a fun exercise to consider how you'd solve the problem of securing an installation script which are much more homogeneous in behavior than generic applications.
It's just whatever the github.com servers choose to serve you. Isn't that the point? If you trust what they serve you then it's safe to run, and if you don't then it isn't. Which is exactly the same situation as the software itself in the main repo isn't it?
How is curling and running a script any different to cloning it and running it?
Are you thinking that the fact that the repo has a commit hash saves you? What are you verifying the commit hash against? What you see on the website? The website also served by github.com? And how do you know the commit hash isn't accurate it's just a hash of code that does indeed contain attacking code?
I'm not sure any of it makes any difference. github.com can serve you code containing attacks from either the repo or the installation script and in both vectors you're just as vulnerable.