it outsources buffer management and user thread i/o scheduling to the kernel. for some use cases it's a great way to simplify implementation or boost performance. for others it may not perform as well.
the blog post points (in my mind) at some more general advice when programming which is not to mix and match paradigms unless you really know what you're doing. if you want to do user space async io, cool. if using kernel features tickles your fancy, also cool.
mixing both without a deep understanding of what's going on under the hood will probably give you trouble.
Making it work asynchronously would require the compiler to split the memory access into two parts, a non-blocking IO dispatch and a blocking access to the mapped address. The OS would need to support that, however, and the language would need to keep track of what is a materialised array and what’s not.
as i understand, mmap is only efficient because it can leverage hardware support for trapping into the kernel when a page needs to be loaded to satisfy an access attempt.
i think adding software indirection to every access in the mapped region would be really slow.
i think a better answer would be to impose more structure on the planned memory access, then maybe given some constraints (like say, "this loop is embarrassingly parallel") the system could be smarter about working on the stuff in ram first while the rest is loaded in.
> every access in the mapped region would be really slow.
Would certainly be slower. The compiler would need to be aware we want this behaviour and split the access in two parts, one to trigger the page read and yield to the app’s async loop, and another to resolve the read when the page has loaded. This would only need to happen for explicitly marked asynchronous memory reads (doing that without hardware support for all memory reads would be painful).
i can't think of any other syscall that makes use of tlb caches and page fault machinery to enter the kernel as needed in response to ordinary user space memory access.
I think you could make by with some kind of async memory-touch system call, i.e. "page in this range of memory, notify me when finished". The application would have to call this on blocks of the mmap prior to actually reading it.
This of course means you lose some of the benefits of mmap (few system calls, automatic paging), but would maybe still be beneficial from a performance perspective.
It would allow a memory read to yield to the async loop, but overall performance of the read itself would always be lower.
It’s the kind of thing that would be better implemented as a special “async buffer” where reads are guarded by a page fault handler that returns as soon as the read is scheduled and a read that yields on an unresolved page load.
the blog post points (in my mind) at some more general advice when programming which is not to mix and match paradigms unless you really know what you're doing. if you want to do user space async io, cool. if using kernel features tickles your fancy, also cool.
mixing both without a deep understanding of what's going on under the hood will probably give you trouble.