Lets say you have three numbers, a, b, c, and you want to add them together to get the total. Then later, "c" changes, and you'd like to re-compute the total. One option would be to re-run the full sum, a + b + c, which would be fine. However, that repeats the "a + b" calculation.
Would it be possible to improve the efficiency of the total calculation by re-using the pre-computed "a + b" if only "c" changes?
Differential dataflow is one way to do that, but only really applies if you have lots of data with complex calculations. For analytics, maybe the "a + b" calculation would cover your last 5 years of operations, and then when a new day's worth of data comes in, you just compute the changes to the totals, rather than re-computing the analytics for all those years, all without manually having to write distinct "total" and "update" code.
Sounds like basic memoization and topological sort gets you all the way there? If that's really all this is about, I'm sure there a lots of adhoc implementations of it in many codebases. It doesn't necessarily seem like something you'd need to bring in a Rust framework to do.
Edit: lots of downvotes, yet no replies? Can explain someone why my comment is apparently so terrible...?
You're asking this in good faith, so you don't deserve to get downvoted. I think people on hn are a bit sensitive to comments they perceive as "reductionist" that may oversimplify complex problems, even if they're honest questions.
But you're right, at it's core this kind of problem will use techniques like memoization, and most projects that need something like this will have adhoc approaches to solve this problem. The advantage of differential data flow is that it's a generalized approach to this problem. The business logic behind these workflows, between tracking dependencies and updates can get pretty damn complicated and difficult to maintain. Having a generalized approach would make building these dataflows much simpler.
I think I maybe a bit environmentally damaged from mainly using Clojure. Algorithms similar to this are fairly common in the Clojure ecosystem. Memoization is part of the standard library too.
The claim (in the article) that no one cares about Differential Dataflow seems to be only true when talking about this specific library. The general concept surely translates to some combination of simple concepts like memoization, topological sorting, partial application, etc. so it's obvious to me that many adhoc implementations would exist tailored to more specific needs in a different programming languages with different feature sets. Sometimes buying into a framework is a lot more work than rolling your own, especially if it means having to switch to a different programming language.
Importantly, this doesn't just use memoization (it actually avoids having to spend memory on that), but rather uses operators (nodes in the dataflow graph) that directly work with `(time, data, delta)` tuples. The `time` is a general lattice, so fairly flexible (e.g. for expressing loop nesting/recursive computations, but also for handling multiple input sources with their own timestamps), and the `delta` type is between a (potentially commutative) semigroup (don't be confused, they use addition as the group operation) and an abelian group. E.g. collections that are iteratively refined in loops often need an abelian `delta` type, while monoids (semigroup + explicit zero element) allow for efficient append-only computations [0].
> Sounds like basic memoization and topological sort gets you all the way there?
I don't really want to pull rank here, but for the benefit of other readers: 100% nope.
I personally find the "make toxic comments to draw folks out" rhetorical style frustrating, so I'll just leave you with a video from Clojure/conj about how nice it would be to be able to use DD from Clojure, to get a proper reactive Datomic experience.
My comment was basically (paraphrasing here) "given that my understanding of the problem is it that it can be pulled off using simple constructs X and Y, seems like most people wouldn't need to pull in framework Z".
It's puzzling to me why you _wouldn't_ want to "pull rank", as you say. I did not pretend to be an expert in this domain. I'm really just exposing my knowledge and speculating about why people apparently aren't using this framework, which is what the damn submission is about. Did you even read it?
It seems like I managed to piss off a bunch users of the framework, who - rather than simply explain in clear terms why I'm supposedly wrong - instead just downvote away and make passive-aggressive comments that assume I'm some sort of troll.
Remind me to never engage with the Rust community again. Jfc.
Edit: Oh, so you're the creator of the framework? If you go straight to calling people toxic when they have questions about it, I think I understand why no one wants to use it.
I completely understand where you're coming from and I've been downvoted for expressing non-popular views here, and I relate to your frustration.
That being said, rest assured that your experience says absolutely nothing about the wider Rust community. It's one of the most helpful ones I've engaged with.
So please don't judge it by one strangely toxic framework creator.
> That being said, rest assured that your experience says absolutely nothing about the wider Rust community. It's one of the most helpful ones I've engaged with.
It's very common to see people with toxic attitudes in and around the Rust community, even in their internal communication about how to use Rust (`actix-web`, anyone?). I don't think it's helpful to lie to yourself about the Rust community like this.
The only thing Rust users who don't want to have these conversations can do is to openly recognize and talk about the extreme fanaticism Rust users commonly display and the toxic pattern of communication that sometimes is bundled or separate, when it comes to priorities in software dev.
From what I see it's the dismissive way it was posed, with little curiosity about the real challenges. Similar to the 'oh I could build that in a weekend' style comments that are pretty exhausting for creators to have to deal with.
This submission is literally about why people aren't using some Rust framework. I add my two cents as to why that might be and then that gets called toxic and dismissive.
Seems like many people here aren't actually willing to engage in a discussion. I guess this submission is basically just native advertisement for the framework in question.
It's madness to combine application logic with update management in your codebase. Update management is very hard to get right, and there are a lot of corner cases that only show up under extreme conditions with delayed or duplicate delivery of updates. When the update logic is incorrect, you'll occasionally get plausible-but-somewhat-wrong answers that are hard to reproduce. The very worst kind of bug.
It's much better to have the update logic handled in a thoroughly tested library, and build your application logic on top.
It's tautological that anything that can be implementated can be implementated. Libraries and frameworks give you the implementation without taking the time to do it yourself, so you can focus on your core competency.
That wasnt my point at all. My point was that this seems like basic application of simple concepts from Computer Science, so it's not that odd if people aren't thinking about using this library to do it.
> My point was that this seems like basic application of simple concepts from Computer Science
I'm not sure that differential dataflow is that simple as I haven't checked the paper nor the repository—according to other replies, it isn't—but if it were, that's all the more reason to use a library/tool instead of reimplementing something for the thousandth time, I think.
Topological sorting (along with basic graph theory) is something any beginners course on discrete mathematics should already have taught you. You would reach for that in most cases where you need to deal with a graph of dependencies.
The other part of the puzzle is about storing calculations, i.e. memoization. This is trivial in my language of choice, but really it's not a hard problem to solve in any language. You map function inputs to outputs somewhere in memory and retrieve the results when needed.
These techniques are broadly applicable in many domains. To many people, the intuition would be to reach for them directly when they have a task that looks like a graph search, rather than go look for a framework or library that they would then need to read the documentation for and spend time integrating with their code. Sometimes less is more.
My point is really just that a lot more people would think about this as a graph or memoization problem than would ever think to go look for a Rust framework. Maybe if their own solution at some point doesn't work out, they will start searching for frameworks or libraries.
(Someone correct me if I'm wrong!) I think about differential dataflow as the solution to "I can't batch data operations, because I don't know when my various inputs will land."
If everything exists at 7am, and/or you don't need the freshest computed values, this is not the solution you need.
If data A is ready between 2-4am, data B at noon, and data C sometime between 8am-6pm, this allows you to abstract that uncertainty into code, then let the system solve it on a daily basis.
This is not a problem everyone has. But it is a problem most people working with inventory or events have! And it's usually a problem people feeding things to ML have.
Would it be possible to improve the efficiency of the total calculation by re-using the pre-computed "a + b" if only "c" changes?
Differential dataflow is one way to do that, but only really applies if you have lots of data with complex calculations. For analytics, maybe the "a + b" calculation would cover your last 5 years of operations, and then when a new day's worth of data comes in, you just compute the changes to the totals, rather than re-computing the analytics for all those years, all without manually having to write distinct "total" and "update" code.