Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

There are certain styles of programming and data structure implementations that end up requiring you to fight Rust at almost every step. Things like intrusive data structures, pointer manipulation and so on. Famously there is an entire book online on how to write a performant linked list in idiomatic Rust - something that is considered straightforward in C.

For these cases you could always use Zig instead of C





Given Zig's approach to safety, you can get the same in C with static and runtime analysis tools, that many devs keep ignoring.

Already setting the proper defaults on a Makefile would get many people half way there, without changing to language yet to be 1.0, and no use-after-free story.


> many devs keep ignoring

And thats why Zig don’t offer much. Devs will just ignore it.


it is not straightforward in rust because the linked list is inherently tricky to implement correctly. Rust makes that very apparent (and, yeah, a bit too apparent).

I know, a linked list is not exactly super complex and rust makes that a bit tough. But the realisation one must have is this: building a linked list will break some key assumptions about memory safety, so trying to force that into rust is just not gonna make it.

Problem is I guess that for several of us, we have forgotten about memory safety and it's a bit painful to have that remembered to us by a compiler :-)


Can you elaborate, what key assumptions about memory safety linked lists break? Sure, double linked lists may have non-trivial ownership, but that doesn't compromise safety.

Rust wants all memory to be modeled as an ownership tree: the same bit of memory can't be owned by more than one data structure. A doubly linked list breaks that requirement so it can't be modeled in safe Rust directly. The options are using unsafe, or using one of the pointer wrapper types that have runtime checks that ensure correct behavior and own the underlying memory as far Rust is concerned.

Right. So it is not that double-linked lists are inherently unsafe, it is (just) Rust ownership model cannot represent them (any other cyclic structures).

It's not that it cannot, it just doesn't want to :-) (but you're right). I guess that in this very case of DLL, it's a bit hard to swallow. To be honest, it's because the rest of rust really helps me in other areas of my projects that I have accepted that. Learning the ownership model of rust is really painful, it really forces you to code in its way and it was not pleasant to me.

I've been trying to convert to Rust an in-memory database and failed. It is strictly single-threaded and full of intrusive lists. I tried hard to please borrow-checker, but one have little choice when internal structures are full of cycles. The result was ugly mess of Rc all over the place. I guess it is just an example of a problem that doesn't fit Rust well.

This makes me wonder: what performance cost Rust code pay due to inability represent cyclic structures efficiently? It seems people tend to design their data in way to please Rust and not in way that would be otherwise more performance efficient.


Using Rc doesn't sound like an intrusive list to me. Personally I find tons of Rcs to be messy, so I'd agree with you.

> what performance cost Rust code pay due to inability represent cyclic structures efficiently?

You can still write the code you'd write in C with unsafe. There's no inherent loss left on the table.

Furthermore, a lot of C folks reach for intrusive lists because it's easy in C, but that doesn't mean that it's always the most performant. See https://bcantrill.dtrace.org/2018/09/28/the-relative-perform... as an example of this phenomenon.


You can do it by combining ghostcell/qcell along with some bespoke "static/compile-time reference counting" for the double links part. But ghostcell/qcell is quite difficult to use with Rust's current feature set (it has to use lifetime hacks to place a safe "brand" on type instantiations, a kind of quasi-capability construct), so it hasn't become a part of standard rust so far.

Sure. Now import a useful performant B+tree in C from a reusable library, while enforcing type safety for your keys and values.

Lots of things C chose to use intrusive pointers and custom data structures for, you would program very differently in a different language.

I'm an old C neckbeard and I find Rust a great experience. Some of the arguments against it sound like people are complaining about how hard it is to run pushing a bicycle.


Or just build a tested unsafe implementation as a library. For example the Linked List in the standard library.

https://doc.rust-lang.org/src/alloc/collections/linked_list....


Yeah, if you need a linked list (you probably don't) use that. If however you are one of the very small number of people who need fine-grained control over a tailored data-structure with internal cross-references or whatnot then you may find yourself in a world where Rust really does not believe that you know what you are doing and fights you every step of the way. If you actually do know what you are doing, then Zig is probably the best modern choice. The TigerBeetle people chose Zig for these reasons, various resources on the net explain their motivations.

The point with the linked list is that it is perfectly valid to use unsafe to design said ”tailored data structure with internal cross-reference or what not” library and then expose a safe interface.

If you’re having trouble designing a safe interface for your collection then that should be a signal that maybe what you are doing will result in UB when looked at the wrong way.

That is how all standard library collections in Rust works. They’ve just gone to the length of formally verifying parts of the code to ensure performance and safety.


> If you’re having trouble designing a safe interface for your collection then that should be a signal that maybe what you are doing will result in UB when looked at the wrong way.

Rust is great, but there are some things that are safe (and you could prove them safe in the abstract), but that you can't easily express in Rust's type system.

More specifically, there are some some things and usage pattern of these things that are safe when taken together. But the library can't force the safe usage pattern on the client, with the tools that Rust provides.


If you can't create a safe interface and must have the function then create an unsafe function and clearly document the invariants and then rely on the user to uphold them?

Take a look at the unsafe functions for the standard library Vec type to see examples of this:

https://doc.rust-lang.org/std/vec/struct.Vec.html#method.fro...


> If you can't create a safe interface and must have the function then create an unsafe function and clearly document the invariants and then rely on the user to uphold them?

Yes, that's what you do in practice. But it's no different--in principle--from the approach C programmers have to use.


>>That is how all standard library collections in Rust works

Yeah and that's what not going to work for high performance data structures because you need to embed hooks into the objects - not just put objects into a bigger collection object. Once you think in terms of a collection that contains things you have already lost that specific battle.

Another thing that doesn't work very well in Rust (from my understanding, I tried it very briefly) is using multiple memory allocators which is also needed in high performance code. Zig takes care to make it easy and explicit.


I think that misses the point though. C trusts you to design your own linked list.

It also trusts your neighbor, your kid, your LLM, you, your dog, another linked list...


what is an intrusive data structure?

A container class that needs cooperation from the contained items, usually with special data fields. For example, a doubly linked list where the forward and back pointers are regular member variables of the contained items. Intrusive containers can avoid memory allocations (which can be a correctness issue in a kernel) and go well with C's lack of built-in container classes. They are somewhat common in C and very rare in C++ and Rust.

At least for a double linked list you can probably get pretty far in terms of performance in the non-intrusive case, if your compiler unboxes the contained item into your nodes? Or are there benefits left in intrusive data structures that this doesn't capture?

Storing the data in nodes doesn't work if the given structure may need to be in multiple linked lists, which iirc was a concern for the kernel?

And generally I'd imagine it's quite a weird form for data structures for which being in a linked list isn't a core aspect (no clue what specifically the kernel uses, but I could imagine situations where where objects aren't in any linked list for 99% of time, but must be able to be chained in even if there are 0 bytes of free RAM ("Error: cannot free memory because memory is full" is probably not a thing you'd ever want to see)).


> Storing the data in nodes doesn't work if the given structure may need to be in multiple linked lists, which iirc was a concern for the kernel?

That's a great point! A language almost like C plus a smart enough compiler could do the unboxing, but this technique doesn't work for multiple structures.

> And generally I'd imagine it's quite a weird form for data structures for which being in a linked list isn't a core aspect (no clue what specifically the kernel uses, but I could imagine situations where where objects aren't in any linked list for 99% of time, but must be able to be chained in even if there are 0 bytes of free RAM ("Error: cannot free memory because memory is full" is probably not a thing you'd ever want to see)).

I think you are right in practice, though in principle you could pre-allocate the necessary memory when you create the items? When you have the intrusive links, you pay for their allocation already anyway, too. In terms of total storage space, you wouldn't pay for more.

(And you don't have to formally alloc them via a call to kmalloc or so. In the sense that you don't need to find a space for them: you just need to make sure that the system keeps a big enough buffer of contiguous-enough space somewhere. Similar to how many filesystems allow you to reserve space for root, but that doesn't mean any particular block is reserved for root up-front.)

But as I thought, that's about in-principle memory usage. A language like C makes the alternative of intrusive data structures much simpler.


> you just need to make sure that the system keeps a big enough buffer of contiguous-enough space somewhere. Similar to how many filesystems allow you to reserve space for root

Said file system reserving can fill up (I've experienced that :) ). Could perhaps engineer some super-equation for a global reserved memory size that can actually guarantee being sufficient; but, unless you mark allocations as needing reserved space vs not and compute based on that (which is back to overhead), that'll necessarily waste a good amount of memory.

And such a global equation of course can result in global failure if just one subsystem miscalculates its required reserved amount, or the calculations improperly account for worst-case non-contiguousness or interleaved different-size-linked-list-alloc-requests or whatever. (never mind that you now need to alloc when adding elements to linked lists, an action that would otherwise be 1-4 inlined stores)

Probably (?) possible to handle, but way way way more easy to break or get wrong (with very-difficult-to-debug consequences) than just some pointer fields.

Oh and also of course, if you store list data in a separate allocation from the main object, you lose the ability to do O(1) (and extremely-cheap at that) "remove self from linked list just given my own pointer", which is probably quite common (and you can't even O(1) with an arraylist instead of a linkedlist).


> Storing the data in nodes doesn't work if the given structure may need to be in multiple linked lists

That is why kernel mostly (always?) uses intrusive linked lists. They have no such problem.


The main thing is that he object can be a member of various structures. It can be in big general queue and in priority queue for example. Once you find it and deal with it you can remove it from both without needing to search for it.

Same story for games where it can be in the list of all objects and in the list of objects that might be attacked. Once it's killed you can remove it from all lists without searching for it in every single one.


Thanks! That makes a lot of sense.

> if your compiler unboxes the contained item into your nodes

Is there known compilers that can do that?


Haskell's GHC partially does it. LLVM can do it in principle, if your frontend gives enough information. Some JVMs can partially do some of it.

The above is about the optimiser figuring out whether to box or unbox by itself.

If you are willing to give the compiler a hand: Rust can do it just fine and it's the default when you define data structures. If you need boxing, you need to explicitly ask for it, eg via https://doc.rust-lang.org/std/boxed/struct.Box.html


A data structure that requires you to change the data to use it.

Like a linked list that forces you to add a next pointer to the record you want to store in it.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: