> If an operation results in conflicts, information about those conflicts will be recorded in the commit(s). The operation will succeed. You can then resolve the conflicts later.
I’m really glad people are trying this out. I’ve spent the last decade or so playing with collaborative editing algorithms. Ideally I’d like tools like git to eventually be replaced by CRDT based approaches. CRDTs would let us use the same tools to do pair programming. CRDTs also handle complex merges better (no self-conflicts like you can get with git). And they’re generally a more powerful model.
One problem with all modern text CRDTs (that I know of) is that they do automatic conflict-free resolution of concurrent edits. But when we collaborate offline on code, we usually want conflicts to show up and be resolved by hand. CRDTs should be able to handle that no problem - they have more information about the edit history than git, but doing this properly will (I think) require that we put the conflicts themselves into the data model for what a text file is. And I’m not sure how that should all work with modern text editors!
Anyway, it sounds like jj has figured out the same trick. I’m excited to see how well it works in practice. With this we’re one step closer to my dream of having a crdt based code repository!
You should check out Pijul, as it essentially implements everything you mentioned here. Pijul works on patches which are CRDTs, it makes conflicts a first-class concept, etc.
Pijul is interesting, but my understanding is that it still operates on lines, and does diffs when you push the commit button. And that makes it not work well for real-time collaborative editing. Ideally I’d like a tool that can span both the real-time collaborative editing and offline collaboration use cases. But it’s a very interesting tool, and I’d like to have another read through the docs at some point. I remember being very impressed with it when I took a look a few years ago.
Last time I looked at it they had experimental support for binary diffs (not just lines). That was quite a while ago though, they have probably gotten further now.
I will be surprised if it's possible at all.
And I think right now such a back-end would not be high on Pijul priority list.
But if I am wrong, I would be very interested to learn about it!
Have you not ever found any value in `git bisect`?
If you have a bug which is reproducible, but whose cause is complex, do you not think it's useful to be able to find the commit that introduced the bug in order to see which change caused it? If only to get a good first idea of what might need to be fixed?
Currently, `git bisect` works best if every commit is buildable and runnable, in order that any commit can be automatically tested for the presence of the bug, to narrow down the problem commit as quickly as possible. If some commits don't build or run because they contain conflict markers, this make `git bisect` need a lot more manual intervention.
Can you think of a way in which an equivalent of `git bisect` might be adapted to work in this scenario?
Note that just scanning for conflict markers might not be appropriate, in case a file legitimately contains text equivalent to conflict markers - e.g. in documentation talking about conflict markers, or something like `=======` being usable as an underline in some markup languages.
Yeah git bisect is great. And CRDTs on their own probably won’t preserve enough information to allow bisecting to happen - since we don’t know which points in time have working builds.
One approach to preserve the ability to bisect would be to allow users to periodically mark points in time with “commits” if they want. The commits wouldn’t be used for synchronisation or merging (since we have the crdt information for that). Instead, they could act semantically much more like anonymous git tags. But they would still be useful as landmarks to mark working builds and milestones. And for git bisect. We could give them associated commit logs. (“Got feature X working”, “Release 1.0.5”, etc).
Commits might also be a good way to manage pruning. If users type then delete something, many CRDTs will keep a copy of the deleted characters indefinitely in a log. But we could design it so it only durably persists inserted characters which still exist at at least one commit. Yjs already does something like this.
It seems CRDTs in the limit approach a commit for every keypress. Meanwhile a commit in git is an atomic set of keypresses, to transition from one working state to another. So, in a CRDT world with a commit for every keystroke we would need to annotate sets of commits. Like “all of these commits change the title ‘hello’ to ‘world’”. Would be interesting.
> Ideally I’d like tools like git to eventually be replaced by CRDT based approaches. CRDTs would let us use the same tools to do pair programming. CRDTs also handle complex merges better (no self-conflicts like you can get with git). And they’re generally a more powerful model.
I'd be interested to see how this plays out in practice.
It seems to be in conflict with the idea that scm history is a meaningful deliverable that should be arranged as series of incremental atomic changes before a patch series leaves your development machine.
However, most developers I interact with already treat git history as an infinite editor undo history, this approach seems like it would crystalize that fact.
How do you envision the (long-term) history working? Do you think it would provide more/less utility?
I’ve thought a lot about that. Personally, I think it’s silly that people try to use git’s history for two different purposes:
1. An audit log of what actually happened, and when
2. A curated story of features added and changed
I don’t think git can be both of those things at the same time already. And we see this tension play out when people argue about squishing commits and rebasing before merging.
Personally I think both pieces of information have value. And they should be managed separately - as different features. Eg Git could (and perhaps should) have a second semantic layer which marks a set of commits (atomic history of changes) as semantically associated with a particular feature or issue. This is how I imagine a crdt based scm working: the history of all changes is stored immutably, and synchronised. And change sets can be grouped and marked as “this set of changes is associated with feature XXX”, to make it easier to understand the intent behind code, and roll back specific change sets.
> Eg Git could (and perhaps should) have a second semantic layer which marks a set of commits (atomic history of changes) as semantically associated with a particular feature or issue.
It does (sort of)! They're called branches and tags.
I’m really glad people are trying this out. I’ve spent the last decade or so playing with collaborative editing algorithms. Ideally I’d like tools like git to eventually be replaced by CRDT based approaches. CRDTs would let us use the same tools to do pair programming. CRDTs also handle complex merges better (no self-conflicts like you can get with git). And they’re generally a more powerful model.
One problem with all modern text CRDTs (that I know of) is that they do automatic conflict-free resolution of concurrent edits. But when we collaborate offline on code, we usually want conflicts to show up and be resolved by hand. CRDTs should be able to handle that no problem - they have more information about the edit history than git, but doing this properly will (I think) require that we put the conflicts themselves into the data model for what a text file is. And I’m not sure how that should all work with modern text editors!
Anyway, it sounds like jj has figured out the same trick. I’m excited to see how well it works in practice. With this we’re one step closer to my dream of having a crdt based code repository!