Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Stroom – a scalable data storage, processing and analysis platform (github.com/gchq)
96 points by adulau on Dec 29, 2019 | hide | past | favorite | 27 comments


I would suggest trying to get this associated with a community-driven open source foundation like Apache. I think you will struggle to convince developers or enterprises to use a data storage + analytics platform developed and maintained by GCHQ.


"Stroom" is a Dutch word meaning either (electrical) "power" or, more likely in this case, "flow".


Stroom doesn’t mean electrical power, that’s “vermogen”. It means current.


It means chicken manure where I am (Portugal). Gave me a bit of a double take.


`Setrum` in Indonesian (Dutch loanword).


In Danish, similarly, the word ‘strøm’ means electrical current or stream / flow (of something, eg water).


Does it meaningfully deal with binary data, that is can extract then encode, decide them, handle parts with corrupted data, do analysis on them etc? What about images and other large 2d data?


It supports arbitrary data formats, including binary data. It sends corrupted data or data that fails to parse to an 'error' stream where you can complete further processing on it. I don't think Stroom is right for you if you're processing data like images and other large 2d data sets. It is primarily made for data that can be transformed to XML.


Are you actually affiliated or is this just a cute coincidence? Hard to tell from your message history, but I assume even if you are you would not pick a username like that, lol.


Looks like GCHQ didn’t want to pay for Splunk licenses


Is this really equivalent to Splunk? Seems more like a mix of Apache Nifi (developed by NSA) and Spark.


I feel maybe the best comparison might be to Elasticsearch? It takes in mostly log data, parses it (ala Ingest Nodes) and then makes it searchable / shown as dashboards.


The marketing isn't very clear. Does this compete with Prometheus? Nagios?


What ecosystem does this work with best?


Well this looks promising.


    bash <(curl -s https://gchq.github.io/stroom-resources/get_stroom.sh)
GCHQ (straight-faced): Just download and run this shell script we wrote. No funny business, we promise.

NSA: snickers

I think there's space for a person, a company or a tool to certify all the scripts we run by passing the results of a curl directly into a shell. Wonder if there's any money in it.


What's significant about the shell script and curl? The entire repository is software that they wrote that you can chose to run or not. Seems pretty straight up and clear to me. Running it through curl and bash or downloading it doesn't make any material difference.

Not sure what the need for the snide comment is.


Was not being snide. Significance is that these installation scripts tend to be managed separately from the application code and that there are more avenues for attack via these scripts -- it is not usually apparent where the scripts are coming from.

Running them directly post-curl without even verifying a sum of some sort leaves me uncomfortable.

Besides it's a fun exercise to consider how you'd solve the problem of securing an installation script which are much more homogeneous in behavior than generic applications.


> it is not usually apparent where the scripts are coming from

GitHub. The same place as the software. If you don't trust github.com's servers then you don't trust either the software or the installation script.


Where? I couldn't find the file quickly (and from my phone) on either the stroom or the stroom-resources repo.

And when you find it, you still have to perform independent verification that the file on GitHub is the same one you are downloading through curl.

You are treating their installation instructions as equivalent to "clone this repo and run this script inside the repo" when they actually are not.


> Where?

It's just whatever the github.com servers choose to serve you. Isn't that the point? If you trust what they serve you then it's safe to run, and if you don't then it isn't. Which is exactly the same situation as the software itself in the main repo isn't it?

How is curling and running a script any different to cloning it and running it?

Are you thinking that the fact that the repo has a commit hash saves you? What are you verifying the commit hash against? What you see on the website? The website also served by github.com? And how do you know the commit hash isn't accurate it's just a hash of code that does indeed contain attacking code?

I'm not sure any of it makes any difference. github.com can serve you code containing attacks from either the repo or the installation script and in both vectors you're just as vulnerable.


<org-name>.github.io/<project-name> content usually goes to a separate branch in the repository, named "gh-pages" (although this is configurable): https://github.com/gchq/stroom-resources/blob/gh-pages/get_s...


Ah nice, didn't realize it was on a separate branch. Thanks.


There are a bunch of static analysis companies in crypto-contracts. Infosec could be a good pivot if the bubble pops.


Definitely the same niche. Was thinking along the lines of a tcpdump on an isolated docker network for a hacky first draft.


[flagged]


Compared to what?

What was your expectation that is not met?


"./gradlew clean build", according to its developer documentation.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: