I think you could make by with some kind of async memory-touch system call, i.e. "page in this range of memory, notify me when finished". The application would have to call this on blocks of the mmap prior to actually reading it.
This of course means you lose some of the benefits of mmap (few system calls, automatic paging), but would maybe still be beneficial from a performance perspective.
It would allow a memory read to yield to the async loop, but overall performance of the read itself would always be lower.
It’s the kind of thing that would be better implemented as a special “async buffer” where reads are guarded by a page fault handler that returns as soon as the read is scheduled and a read that yields on an unresolved page load.
This of course means you lose some of the benefits of mmap (few system calls, automatic paging), but would maybe still be beneficial from a performance perspective.