tl;dr: conventional design bad, me smart, capability-based pointers (base+offset with provenance) can replace virtual memory, CHERI good (a real modern implementation of capability-based pointers).
The first two points are similar to other Poul-Henning Kamp articles [1]. The last two are more interesting.
I'm inclined to agree with "CHERI good". Memory safety is a huge problem. I'm a fan of improving it by software means (e.g. Rust) but CHERI seems attractive at least for the huge corpus of existing C/C++ software. The cost is doubling the size of pointers, but I think it's worth it in many cases.
I would have liked to see more explanation of how capability-based pointers replacing virtual memory would actually work on a modern system.
* Would we give up fork() and other COW sorts of tricks? Personally I'd be fine with that, but it's worth mentioning.
* What about paging/swap/mmap (to compressed memory contents, SSD/disk, the recently-discussed "transparent memory offload" [2], etc)? That seems more problematic. Or would we do a more intermediate thing like The Mill [3] where there's still a virtual address space but only one rather than per-process mappings?
* What bookkeeping is needed, and how does it compare with the status quo? My understanding with CHERI is that the hardware verifies provenance [4]. The OS would still need to handle the assignment. My best guess is the OS would maintain analogous data structures to track assignment to processes (or maybe an extent-based system rather than pages) but maybe the hardware wouldn't need them?
* How would performance compare? I'm not sure. On the one hand, double pointer size => more memory, worse cache usage. On the other hand, I've seen large systems spend >15% of their time waiting on the TLB. Huge pages have taken a chunk out of that already, so maybe the benefit isn't as much as it seemed a few years ago. Still, if this nearly eliminates that time, that may be significant, and it's something you can measure with e.g. "perf"/"pmu-tools"/"toplev" on Linux.
[4] I haven't dug into how when fetching pointers from RAM rather than pure register operations, but for the moment I'll just assume it works, unless it's probabilistic?
As things stand now, CHERI doesn't replace virtual memory. MMU is still there; CHERI is a layer placed on top (so it's CHERI capabilities -> linear local addresses -> hardware addresses). Which is why things generally work as usual, even though the entire FreeBSD userspace and (sometimes) the kernel are compiled as purecap binaries, using capabilities instead of pointers.
The fork(2) isn't a problem when running like this, but it does become a problem if you want to colocate processes in a single address space. It's not as much of a problem as I'd previously expect: there's vfork(2) and posix_spawn(2); fork is only a problem until subsequent execve(2); and also because many systems don't support fork(2) anyway, userspace had to adapt.
> As things stand now, CHERI doesn't replace virtual memory.
Yeah, he's proposing...something else. It's not clear to me exactly what, except sort of like this obscure historic machine he vaguely described. See e.g. this paragraph:
> The linear address space as a concept is unsafe at any speed, and it badly needs mandatory CHERI seat belts. But even better would be to get rid of linear address spaces entirely and go back to the future, as successfully implemented in the Rational R1000 computer 30-plus years ago.
> I'm inclined to agree with "CHERI good". Memory safety is a huge problem. I'm a fan of improving it by software means (e.g. Rust) but CHERI seems attractive at least for the huge corpus of existing C/C++ software
A lot of C/C++ code assumes that pointers are integers are pointers, so I dunno how big the corpus would actually be. People will cast between them but that's not the end of it, they will also make unions, and they will memcpy from one to another. It wouldn't surprise me if there is a lot of code that even assumes pointers are exactly 64-bit wide.
Note that it's not like with CHERI you can't cast a pointer to int or something. Sure you can, that's one of the main accomplishments: to demonstrate that hardware capabilities can work with real-world source code, like PostgreSQL.
So, it's not like you can't typecast; rather, there are some specific things the hardware will prevent you from doing, eg '(void *)42' - if you force clang to accept it, it will crash at runtime due to missing tag.
Yeah, some source changes would be needed, including removing some clever optimizations. Still much easier than changing languages entirely. I rewrote a small C++ application into Rust. Only a few thousand lines iirc, and I was the sole author of both versions. Even that was a significant effort.
The first two points are similar to other Poul-Henning Kamp articles [1]. The last two are more interesting.
I'm inclined to agree with "CHERI good". Memory safety is a huge problem. I'm a fan of improving it by software means (e.g. Rust) but CHERI seems attractive at least for the huge corpus of existing C/C++ software. The cost is doubling the size of pointers, but I think it's worth it in many cases.
I would have liked to see more explanation of how capability-based pointers replacing virtual memory would actually work on a modern system.
* Would we give up fork() and other COW sorts of tricks? Personally I'd be fine with that, but it's worth mentioning.
* What about paging/swap/mmap (to compressed memory contents, SSD/disk, the recently-discussed "transparent memory offload" [2], etc)? That seems more problematic. Or would we do a more intermediate thing like The Mill [3] where there's still a virtual address space but only one rather than per-process mappings?
* What bookkeeping is needed, and how does it compare with the status quo? My understanding with CHERI is that the hardware verifies provenance [4]. The OS would still need to handle the assignment. My best guess is the OS would maintain analogous data structures to track assignment to processes (or maybe an extent-based system rather than pages) but maybe the hardware wouldn't need them?
* How would performance compare? I'm not sure. On the one hand, double pointer size => more memory, worse cache usage. On the other hand, I've seen large systems spend >15% of their time waiting on the TLB. Huge pages have taken a chunk out of that already, so maybe the benefit isn't as much as it seemed a few years ago. Still, if this nearly eliminates that time, that may be significant, and it's something you can measure with e.g. "perf"/"pmu-tools"/"toplev" on Linux.
* etc
[1] eyeroll at https://queue.acm.org/detail.cfm?id=1814327
[2] https://news.ycombinator.com/item?id=31814804
[3] http://millcomputing.com/wiki/Memory#Address_Translation
[4] I haven't dug into how when fetching pointers from RAM rather than pure register operations, but for the moment I'll just assume it works, unless it's probabilistic?