Stroom – a scalable data storage, processing and analysis platform

solidasparagus · on Dec 29, 2019

I would suggest trying to get this associated with a community-driven open source foundation like Apache. I think you will struggle to convince developers or enterprises to use a data storage + analytics platform developed and maintained by GCHQ.

sjaak · on Dec 29, 2019

"Stroom" is a Dutch word meaning either (electrical) "power" or, more likely in this case, "flow".

rollulus · on Dec 29, 2019

Stroom doesn’t mean electrical power, that’s “vermogen”. It means current.

adamcharnock · on Dec 29, 2019

It means chicken manure where I am (Portugal). Gave me a bit of a double take.

ukz · on Dec 29, 2019

`Setrum` in Indonesian (Dutch loanword).

hestefisk · on Dec 29, 2019

In Danish, similarly, the word ‘strøm’ means electrical current or stream / flow (of something, eg water).

billfruit · on Dec 29, 2019

Does it meaningfully deal with binary data, that is can extract then encode, decide them, handle parts with corrupted data, do analysis on them etc? What about images and other large 2d data?

gchq-7703 · on Dec 29, 2019

It supports arbitrary data formats, including binary data. It sends corrupted data or data that fails to parse to an 'error' stream where you can complete further processing on it. I don't think Stroom is right for you if you're processing data like images and other large 2d data sets. It is primarily made for data that can be transformed to XML.

616c · on Dec 29, 2019

Are you actually affiliated or is this just a cute coincidence? Hard to tell from your message history, but I assume even if you are you would not pick a username like that, lol.

Jordanpomeroy · on Dec 29, 2019

Looks like GCHQ didn’t want to pay for Splunk licenses

hestefisk · on Dec 29, 2019

Is this really equivalent to Splunk? Seems more like a mix of Apache Nifi (developed by NSA) and Spark.

gchq-7703 · on Dec 29, 2019

I feel maybe the best comparison might be to Elasticsearch? It takes in mostly log data, parses it (ala Ingest Nodes) and then makes it searchable / shown as dashboards.

Dowwie · on Dec 29, 2019

The marketing isn't very clear. Does this compete with Prometheus? Nagios?

amelius · on Dec 29, 2019

What ecosystem does this work with best?

unixhero · on Dec 29, 2019

Well this looks promising.

zomglings · on Dec 29, 2019

    bash <(curl -s https://gchq.github.io/stroom-resources/get_stroom.sh)

GCHQ (straight-faced): Just download and run this shell script we wrote. No funny business, we promise.

NSA: snickers

I think there's space for a person, a company or a tool to certify all the scripts we run by passing the results of a curl directly into a shell. Wonder if there's any money in it.

chrisseaton · on Dec 29, 2019

What's significant about the shell script and curl? The entire repository is software that they wrote that you can chose to run or not. Seems pretty straight up and clear to me. Running it through curl and bash or downloading it doesn't make any material difference.

Not sure what the need for the snide comment is.

zomglings · on Dec 29, 2019

Was not being snide. Significance is that these installation scripts tend to be managed separately from the application code and that there are more avenues for attack via these scripts -- it is not usually apparent where the scripts are coming from.

Running them directly post-curl without even verifying a sum of some sort leaves me uncomfortable.

Besides it's a fun exercise to consider how you'd solve the problem of securing an installation script which are much more homogeneous in behavior than generic applications.

chrisseaton · on Dec 29, 2019

> it is not usually apparent where the scripts are coming from

GitHub. The same place as the software. If you don't trust github.com's servers then you don't trust either the software or the installation script.

zomglings · on Dec 29, 2019

Where? I couldn't find the file quickly (and from my phone) on either the stroom or the stroom-resources repo.

And when you find it, you still have to perform independent verification that the file on GitHub is the same one you are downloading through curl.

You are treating their installation instructions as equivalent to "clone this repo and run this script inside the repo" when they actually are not.

chrisseaton · on Dec 29, 2019

> Where?

It's just whatever the github.com servers choose to serve you. Isn't that the point? If you trust what they serve you then it's safe to run, and if you don't then it isn't. Which is exactly the same situation as the software itself in the main repo isn't it?

How is curling and running a script any different to cloning it and running it?

Are you thinking that the fact that the repo has a commit hash saves you? What are you verifying the commit hash against? What you see on the website? The website also served by github.com? And how do you know the commit hash isn't accurate it's just a hash of code that does indeed contain attacking code?

I'm not sure any of it makes any difference. github.com can serve you code containing attacks from either the repo or the installation script and in both vectors you're just as vulnerable.

rzzzt · on Dec 29, 2019

<org-name>.github.io/<project-name> content usually goes to a separate branch in the repository, named "gh-pages" (although this is configurable): https://github.com/gchq/stroom-resources/blob/gh-pages/get_s...

zomglings · on Dec 29, 2019

Ah nice, didn't realize it was on a separate branch. Thanks.

sansnomme · on Dec 29, 2019

There are a bunch of static analysis companies in crypto-contracts. Infosec could be a good pivot if the bubble pops.

zomglings · on Dec 29, 2019

Definitely the same niche. Was thinking along the lines of a tcpdump on an isolated docker network for a hacky first draft.

sheeshkebab · on Dec 29, 2019

[flagged]

ysleepy · on Dec 29, 2019

Compared to what?

What was your expectation that is not met?

rzzzt · on Dec 29, 2019

"./gradlew clean build", according to its developer documentation.