This is similar but not quite the same as persistent data structures. In particular:
- We can avoid quite a few allocations in loops by mutating lists/dicts in place if we hold an exclusive reference (and after the first mutation, we always will). Updates to persistent data structures are relatively cheap, but they're a lot more expensive than an in-place update.
- Herd has syntax sugar for directly modifying nested values inside lists/dicts. E.g. `set foo.bar.[0].baz = 1;`.
In practice, is this faster than a different implementation of the same semantics using persistent data structures and a tracing GC? That will depend on your program.
> How to pass a mutable reference into a function, so that it can change the underlying value and the caller can observe these changes?
Just modify the value inside the function and return it, then assign back. This is what the |= syntax is designed for. It's a bit more verbose than passing mutable references to functions but it's actually functionally equivalent.
Herd has some optimisations so that in many cases this won't even require any copies.
> What about concurrent mutable containers?
I've considered adding these, but right now they don't exist in Herd.
Personally I think local mutability is quite a useful property, which was part of the inspiration for making this instead of just another pure functional language:
- All functions are still referentially transparent, which means we get all the local reasoning benefits of pure functions.
- We can mutate local variables inside loops (instead of just shadowing bindings), which makes certain things a lot easier to write (especially for beginners).
- Mutating nested fields is super easy: `set foo.bar[0].baz = 1;` (compare this to the equivalent Haskell).
Nope, it's value semantics (unlike Python), the references are just an internal optimization. Implicit copies happen when a list/dict with more than one reference is mutated.
This applies everywhere, and it fundamentally wouldn't be possible for just function calls.
> needless cost
Are you comparing to a language with mutable references or a our functional language? A language with mutable references will of course be faster, but this is more intended as an alternative to pure functional languages (since functions are referentially transparent).
In this case, the cost of the indirection is approximately zero (relative to the cost of just doing reference counting), since passing a reference just requires a bump to the refcount. And most of the time the refcount increments are skipped by "moving" instead of copying the reference.
I'm comparing to the same language implemented without the supposed internal optimization, according to my understanding of what you're doing.
But I think at this point I'd have to read and analyze the code to have a proper understanding.
... Although granting that you already have paid the cost of reference-counting GC (and, I assume, per-object allocation), it probably is indeed insignificant. And special-casing where the reference count == 1 is also kinda neat. (E.g. CPython doesn't "move references" per se if I'm thinking clearly; but it does detect cases where it can safely mutate the underlying implementation of a type that's "immutable" from the perspective of language syntax.)
Nothing "real", just the synthetic benchmarks in the ./benchmarks dir.
Unnecessary copies are definitely a risk, and there's certain code patterns that are much harder for my interpreter to detect and remove. E.g. the nbodies has lots of modifications to dicts stored in arrays, and is also the only benchmark that loses to Python.
The other big performance limitation with my implementation is just the cost of atomic reference counting, and that's the main tradeoff versus deep cloning to pass between threads. There would definitely be room to improve this with better reference counting optimizations though.
There is some prior work on mitigating the performance cost of immutability that you might be interested in. For example, Clojure's persistent vectors allow fast modifications without destroying the original vector, because internally they're wide trees rather than just linear arrays of memory. This allows for assignments to be implemented without a copy of the full vector. https://hypirion.com/musings/understanding-persistent-vector...
> Use immutable pass by reference. Make a copy only if mutability is requested in the thread.
This is essentially what Herd does. It's only semantically a pass by value, but the same reference counting optimizations still apply.
In fact, Herd's approach is a bit more powerful than this because (in theory) it can remove the copy entirely if the caller doesn't use the old value any more after creating the thread. In practice, my optimizations aren't perfect and the language won't always detect this.
The big downside is that we have to use atomic reference counts for _everything_. From memory this was about a 5-15% performance hit versus non-atomic counters, though the number might be higher if other bottlenecks were removed.
Yes, if you modify a nested dict/list entry, all nodes above it will be cloned. Here's an example:
x = [1, [2]];
var y = x;
set y.[0] = 3; // clones the outer array, keeps a reference to inner array
set y.[1].[0] = 4; // clones the inner array here. Outer array is now exclusive so it doesn't need another clone.
var z = x;
set z.[1].[0] = 4; // clones both arrays at once
Thanks, I updated the post title based on this and another comment. Thanks for the pointer to Euphoria too, looks like an interesting language with a lot of similar ideas.
> can you take the address of a variable in some way?
I intentionally didn't add this, mostly because I wanted to explore how far you can get without it (and keep the language simple). Having a "real" pointer as a first class type wouldn't work though, since it would break a lot of the assumptions I use for optimizations.
I did think about two different versions of this but didn't end up adding either:
- Something like `inout` parameters in Swift, which aren't first class pointers. This is really just an alternate syntax for returning multiple values.
- A "ref" type, which is essentially a mutable container for an arbitrary value. Passing the container around would share a reference to the same mutable value. This still wouldn't allow modifying values "outside" of the container though.
you're right, once indirection appears pandoras box opens up. keep it as pass by value only, it makes the language unique. although the desire for pointers will never go away, people have used them for so long now.
I think your statement about familiarity is spot on. One of the ‘promises’ of the early functional language efforts (late 70s through mid 90s) was that the invariants made it conceptually simple to have a magical, super compiler take the code and produce binary executables that would surpass imperative languages, due to the aforementioned Pandora’s box effect of the non-functional constructs. Universal value semantics are similar, especially without leaky abstractions allowing “please just let me crack the box, I’ll be good and won’t abuse the power”. Commitment to the invariants, in this case purely (logically, at least) value semantics across the entire language could produce executables that approach low-level, less restrictive code results with a much, much lower burden on the programmer.
A more fitting example would be to support:
IIRC these both currently require an explicit block in my parser.reply