Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The way a workflow like Snakemake can help here is generally by letting the filenames pretty much describe how each particular output was created, meaning data outputs can act as immutable in a sense.

What I mean is that rather than create a new version of a file, if you run the same analysis with different sets of parameters, it should generate a new file with a different name rather than a new version of the old one. This also helps comparing differences between output from different parameters etc.

That said, there are workflow platforms which support data versioning, such as Pachyderm (https://pachyderm.com), but it is a bit more heavyweight as it runs on top of Kubernetes.



The reliance on filenames to define (parametric) dependencies was among the reasons I later adopted nextflow. The model fit the type of computation dependencies better for my case. In the mean time snakemake grew and many DAG that were hard to describe back then are now expressed directly with snakemake primitives




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: