Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The issue is that "workflow orchestration" is a broad problem space. Companies need to address a lot of disparate issues and so any solution ends up being a giant product w/ a lot of associated functionality and heavily opinionated as it grows into a big monolith. This is why almost universally folks are never happy.

In reality there are five main concerns: 1. Resource scheduling-- "I have a job or collection of jobs to run... allocate them to the machines I have" 2. Dependency solving-- If my jobs have dependencies on each other, perform the topological sort so I can dispatch things to my resource scheduler 3. API/DSL for creating jobs and workflows. I want to define a DAG... sometimes static, sometimes on the fly. 4. Cron-like functionality. I want to be able to run things on a schedule or ad-hoc. 5. Domain awareness-- If doing ETL I want my DAGs to be data aware... if doing ML/AI workflows then I want to be able to surface info about what I'm actually doing with them

No one solution does all these things cleanly. So companies end up building or hacking around off the shelf stuff to deal with the downsides of existing solutions. Hence it's a perpetual cycle of everyone being unhappy.

I don't think that you can just spin up a startup to deliver this as a "solution". This needs to be solved with an open source ecosystem of good pluggable modular components.



The issue indeed is that "workflow orchestration" is a broad problem space. I would argue that the solution is not this:

> I don't think that you can just spin up a startup to deliver this as a "solution". This needs to be solved with an open source ecosystem of good pluggable modular components.

But rather more specialized tools that solve specific issues.

What you describe just sounds like a better implemented version of Airflow or the over 100 other systems that are actively trying to be this today (Flyte, Dagster, Prefect, Argo Workflows, Kubeflow, Nifi, Oozie, Conductor, Cadence, Temporal, Step Functions, Logic Apps, your CI system of choice has their own, need I continue, that is not even scratching the surface). Most of those have some sort of "plugin" ecosystem for custom code, in varying degrees of robustness.

For what it is worth, everyone and their mom thinks they can make and wants to be this orchestrator. It's a problem that is just so generic and such a wide net that you end up with annoying-to-use building blocks because everyone wants to architecture astronaut themselves into being the generic workflow orchestration engine. The ultimate system design trap: Something so fundamentally easy to grok and conceptualize that you can PoC one in hours or days, but near infinite possibilities of what you can do with it, resulting in near infinite edge cases.

Instead, I'd rather companies just focus on the problem space that it lends itself to. Instead of Dagster saying "Automate any workflow" and try to capture that space, just make building blocks for data engineering workflows and get really good at that. Instead of Github Actions being a generic "workflow engine" just have it really good at making CI workflow building blocks.

But we can't have it that way. Because then some architecture astronaut will come around and design a generic workflow engine for orchestrating your domain specific workflow engines and say that you no longer need those.

Actually I think I just convinced myself that what you are suggesting actually IS the right way. If companies just said "we will provide an Airflow plugin" instead of building their own damn Airflow this would be easy. But we won't ever have that either. What we really need is some standards around that. Like if CNCF got together and got tired of this and said "This is THE canonical and supported engine for Kube workflows, bring your plugins here if you want us to pump you up". That might work. They've usually had better luck with putting people in lockstep in the Kube ecosystem at least than Apache has historically for more general FOSS stuff. Probably because the problem space there is more limited.


great insight, appreciate this. would also point out logging/event sourcing for "free" observability




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: