it outsources buffer management and user thread i/o scheduling to the kernel. fo...

rbanffy · on Aug 24, 2024

Making it work asynchronously would require the compiler to split the memory access into two parts, a non-blocking IO dispatch and a blocking access to the mapped address. The OS would need to support that, however, and the language would need to keep track of what is a materialised array and what’s not.

a-dub · on Aug 24, 2024

as i understand, mmap is only efficient because it can leverage hardware support for trapping into the kernel when a page needs to be loaded to satisfy an access attempt.

i think adding software indirection to every access in the mapped region would be really slow.

i think a better answer would be to impose more structure on the planned memory access, then maybe given some constraints (like say, "this loop is embarrassingly parallel") the system could be smarter about working on the stuff in ram first while the rest is loaded in.

rbanffy · on Aug 25, 2024

> every access in the mapped region would be really slow.

Would certainly be slower. The compiler would need to be aware we want this behaviour and split the access in two parts, one to trigger the page read and yield to the app’s async loop, and another to resolve the read when the page has loaded. This would only need to happen for explicitly marked asynchronous memory reads (doing that without hardware support for all memory reads would be painful).

gpderetta · on Aug 25, 2024

Normal syscalls also "leverage hardware support for trapping into the kernel". Mmap is usually used because it is a simple way to do 0-copy disk I/O.

a-dub · on Aug 27, 2024

i can't think of any other syscall that makes use of tlb caches and page fault machinery to enter the kernel as needed in response to ordinary user space memory access.

noctune · on Aug 25, 2024

I think you could make by with some kind of async memory-touch system call, i.e. "page in this range of memory, notify me when finished". The application would have to call this on blocks of the mmap prior to actually reading it.

This of course means you lose some of the benefits of mmap (few system calls, automatic paging), but would maybe still be beneficial from a performance perspective.

rbanffy · on Aug 25, 2024

It would allow a memory read to yield to the async loop, but overall performance of the read itself would always be lower.

It’s the kind of thing that would be better implemented as a special “async buffer” where reads are guarded by a page fault handler that returns as soon as the read is scheduled and a read that yields on an unresolved page load.

gpderetta · on Aug 25, 2024

io_uring + madvise is probably the the closest solution.

Although if you are using uring, there are other options for async disk Io.