Shameless plug: I've written difftastic[1], a tool that builds ASTs and then does a structural diff of them. You can use it with git too.
It's an incredibly hard problem though, both from a computational complexity point of view, and trying to build a comprehensible UI once you've done the structural AST diff.
I think part of the problem is it seems everyone is trying to make a version control tool that is agnostic to all languages. Both computationally and UI wise. But C++ users expect to see different things than JavaScript users and so forth.
I’m surprised at the lack of hyper-specific language version control tools. I thought about making a side project for one in Julia a while back but not quite sure how it would look. Some random thoughts:
- info on type, name, constant changes
- let me checkout older revisions of individual functions / objects / whatever
- on unit test result changes for functions that have unit tests
- when changes are simply a refactor and are functionally the same
Most repositories I work on don't have only one language. They have at the very least two, like the main language and maybe markdown for README files, then configuration like .ini or .toml, json stuff, yml, xml, etcpp. And then you might have bash scripts, Dockerfiles, other build tool languages, etcpp. And those are only text files. You probably will also have images, maybe zipped stuff, office documents and more, all not the "core" repository content, but stored nearby and versioned alongside.
Building a hyper-focussed tool won't be very useful, expect to at least rudimentarily support other file types.
This doesn’t really detract from my point - the “best” tool tool would use knowledge of python for python files, json for json files, and so forth. I think you’re just saying you’d want multiple of these rolled in a single tool as opposed to standalone, which is fair. I think any tool would have to be compatible with git /layer on top of it so it’s available as a fallback
Every change is different in the same way every program is unique, the change of a couple of characters will alter the meaning. I think you have to try to write a diff UI to understand why it is hard.
Difftastic, Meld, diff -u, Word and other tools are amazing because they are usefull in many scenarios. Getting the UI right has been a long process, beingable to grok the changes is still hard even with thw best tooling. It is also a question of tool adoption it takes a long time to understand how a tool works.
Ah, yes, I knew I was forgetting one project. difftastic is very cool, thanks for writing it!
How well do existing VCSs integrate with it? Did you feel restricted at any point by writing a diffing tool, instead of basing a new VCS around this concept? Do you think a deeper integration would allow supporting other functionality beyond diffing, like automatic merging, conflict resolution, etc.?
I agree that it's a very difficult problem. But as an industry, we have more than enough smart people and resources to work on it, which if solved would greatly improve our collective QoL. I can't imagine the amount of time and effort we've wasted fighting with version control tools over the years, and a tool that solved these issues in a smarter way would make our lives much easier.
Git supports external diffing tools really well with GIT_EXTERNAL_DIFF, which you can use with difftastic[1]. Other VCSs are less flexible. For example, I haven't found a nice way of getting a pager when using difftastic with mercurial.
> Did you feel restricted at any point by writing a diffing tool, instead of basing a new VCS around this concept?
Oh, that's an interesting question! Difftastic has been a really big project[2] despite its limited scope and I'm less interested in VCS implementation.
I think text works well as the backing store for a VCS. There are a few systems that have structured backends (e.g. monticello for smalltalk), but they're more constrained. You can only store structured content (e.g. monticello requires smalltalk code) and it must be well-formed (your VCS must understand any future syntax you use).
Unison[3] is a really interesting project in this space, it stores code by hash in a sqlite backend. This makes some code changes trivial, such as renames.
From the perspective of a text diff, an AST diff is lossy. If you add an extra blank line between two unchanged functions, difftastic ignores it. That's great for understanding changes, but not for storage.
I already use delta[1] as a diff viewer, but I suppose GIT_EXTERNAL_DIFF is a deeper integration than just a pager. I've been aware of your project for some time now, but haven't played around with it since I wasn't sure if it would help with automatic conflict resolution, and other issues Git often struggles with. But I'll give it a try soon, thanks again.
I wasn't familiar with Unison. It looks interesting. We definitely need more novel approaches to programming, especially since our field will radically change in a few years as AI becomes more capable.
For languages that have strong IDE refactoring support and userbases that use it a (future) solution would be for the ide to autocommit along the way with metadata to explain what happen "removed unused function based on suggestion", "extracted duplicate", "renamed public method taxed to isTaxed and updated usages across files x, y and z, developer comment: every other of these methods follow the pattern isSomething ".
The last example also add a new feature, and option for a developer to add a comment on an automated refactor.
Ordinary commits could exist on top of this as milestones.
I wouldn't be totally surprised if sooner or later Jetbrains does this. They are creating their own, often better versions of everything I feel and version control could be an obvious next step.
As someone who often prefers other solutions to theirs, I'd prefer if someone else does it first so I end up with something I can use across NetBeans, VS Code, eclipse etc and not something like Kotlin which forces me to use IntelliJ. (Don't get me wrong, IntelliJ is great, I just have NetBeans as my personal favorite.)
It's an incredibly hard problem though, both from a computational complexity point of view, and trying to build a comprehensible UI once you've done the structural AST diff.
[1]: https://github.com/wilfred/difftastic