One of the comments to the article is really interesting. The recent Meltdown st...

stefan_ · on Feb 7, 2020

> hence the suggestions that BPF programs could be submitted through the ring to chain operations all within the kernel

That sounds unbelievably annoying.

No, the limit has been found and it is not in the kernel. You push the I/O loop up for userspace I/O, more specific, not more general. They call it SPDK [1] (or DPDK for networking) and as far I can tell, the principle is essentially having a dummy driver in the kernel that maps the entire PCIe peripheral memory space into your chosen process, and everything flows from there.

At the I/O limit, asynchronous isn't feasible because interrupts introduce latency and waste cycles not doing work. All userspace I/O frameworks work only through polling.

1: https://spdk.io/

couchand · on Feb 7, 2020

> At the I/O limit, asynchronous isn't feasible because interrupts introduce latency and waste cycles not doing work.

io_uring supports polled i/o: https://lore.kernel.org/linux-block/20190116175003.17880-8-a...

rayiner · on Feb 7, 2020

The problem with mechanisms like DPDK is that they bypass all the infrastructure in the kernel and make it hard to play well with others using the same hardware or services. DPDK, for example, bypasses the TCP/IP stack. SPDK bypasses the VFS. You can write your own TCP/IP stack or filesystem on top of those things, but then you can't play well with other processes using those services. While some GPUs can directly multiplex command streams from different processes, most hardware cannot.

saber6 · on Feb 7, 2020

That's the point of DPDK: to get the kernel out of the way of packet processing.

Userland packet processing (in network context) is much more flexible and less brittle than forcing certain functionality to exist solely in the Kernel layer. However things do exist that allow you to (mostly) transparently re-jigger a standard app's TCP/IP calls. One such example is using LD_PRELOAD to "hijack" the sys-calls for certain things and snake it to your (super high performance) userspace app!

There's a lot of exciting stuff happening in the open source networking world (DPDK, VPP/FDio, Network Service Mesh, etc). I really recommend digging into it!

wtallis · on Feb 7, 2020

SPDK is a pain to use. io_uring supports polling, and that gets it within a few percent of SPDK performance.

MayeulC · on Feb 8, 2020

Interestingly, the advent of these asynchronous, context-switch-less interfaces might see microkernels make a comeback, as they were originally decried for performance reasons (they traditionally need many expensive context switches to operate).

hinkley · on Feb 8, 2020

I wonder too if in this batched model if there's some opportunity to further leverage the 'race to idle' solution you see in mobile devices. If I have 16 cores and half of them are frequently spun down, the thermal budget for the others is improved.