Hacker Newsnew | past | comments | ask | show | jobs | submit | maxdamantus's commentslogin

> Per-display DPI settings

fwiw, Xorg already had this, since you can set the DPI for each display through RandR/xrandr. In both X11 and Wayland it's up to the toolkit to actually detect the setting and rasterise accordingly.

Wayland actually went backwards in this respect by using "integer scales" (eg, 1, 2, 3) instead of fine-grained DPIs (eg, 96, 192, 288), so using a scale of 1.5 would result in downscale blur (toolkit sees scale as 2, then the compositor scales it down to 75%), whereas in Xorg you could just set the DPI to 144, and the toolkit could theoretically render at the correct resolution. As far as I know Qt was the only toolkit to actually do this automatically, but that's not X11's fault.

Wayland has at least since fixed this in the form of "fractional scaling" [1], but here's [0] an old thread on HN where I complained about it and provided screenshots of the resulting blur.

[0] https://news.ycombinator.com/item?id=32021261

[1] Doing some quick searching it seems like this is still unsupported in Gtk3/Gtk4, maybe planned for Gtk5? Apparently Firefox has only just added support (December 2025), 3 years after the fractional scaling protocol was released. Seems ridiculous to me that Wayland failed to get this right from the start.


You can have different dpi and refresh rate per monitor in X, but you cannot do it while having a shared desktop across them.


X11 can do it. It's Xinerama that can't.

These days Xinerama is the only mainstream tool for dual head, but there used to be others. Nvidia Twinview was one. I bought my first dual head box in 1996 with two Matrox Millennium cards (although it mainly ran NT4) and those cards later went into my dual Athlon XP machine. That ran SUSE until Ubuntu came out.

Xinerama isn't a sine qua non. It's just easy so it became ubiquitous. Maybe it's time to replace it.


> As far as I know Qt was the only toolkit to actually do this automatically, but that's not X11's fault.

Well if three independent programs have to coordinate to make it work, then I would state that it do not support it at all.


It's the same on Wayland. The client (usually part of a toolkit like Gtk/Qt) needs to subscribe to notifications [0] from the server so it can decide the raster size of the surface it wants to draw to. Qt does this on X11 by detecting when your window moves to a screen with another DPI and resizing/rescaling.

I guess the "third" program would be something like xrandr, so the Wayland analogue to that would be wlr-randr (for wlroots compositors), or some other DE-specific tool for configuring screen sizes. Again there's no fundamental difference here.

[0] https://wayland.app/protocols/fractional-scale-v1#wp_fractio...


Is that any different from Wayland? I'm not opposed to declaring that Wayland doesn't support mixed DPI, but it is a funny conclusion


> It's true that any modern processor has a "compilation step" into microcode anyway, so in an abstract sense, that might as well be some kind of bytecode.

This.

> What I can imagine is a purpose-built CPU that would make the JIT's job a lot easier and faster than compiling for x86 or ARM. Such a machine wouldn't execute raw Java bytecode, rather, something a tiny bit more low-level.

My prediction is that eventually a lot of software will be written in such a way that it runs in "kernel mode" using a memory-safe VM to avoid context switches, so reading/writing to pipes, and accessing pages corresponding to files reduces down to function calls, which easily happen billions of times per second, as opposed to "system calls" or page faults which only happen 10 or 20 million times per second due to context switching.

This is basically what eBPF is used for today. I don't know if it will expand to be the VM that I'm predicting, or if kernel WASM [1] or something else will take over.

From there, it seems logical that CPU manufacturers would provide compilers ("CPU drivers"?) that turn bytecode into "microcode" or whatever the CPU circuitry expects to be in the CPU during execution, skipping the ISA. This compilation could be done in the form of JIT, though it could also be done AOT, either during installation (I believe ART in Android already does something similar [0], though it currently emits standard ISA code such as aarch64) or at the start of execution when it finds that there's no compilation cache entry for the bytecode blob (the cache could be in memory or on disk, managed by the OS).

Doing some of the compilation to "microcode" in regular software before execution rather than using special CPU code during execution should allow for more advanced optimisations. If there are patterns where this is not the case (eg, where branch prediction depends on runtime feedback), the compilation output can still emit something analogous to what the ISAs represent today. The other advantage is of course that CPU manufacturers are more free to perform hardware-specific optimisations, because the compiler isn't targeting a common ISA.

Anyway, these are my crazy predictions.

[0] https://source.android.com/docs/core/runtime/jit-compiler

[1] https://github.com/wasmerio/kernel-wasm (outdated)


> Oh, wait, there is:

As far as I can tell this is a transliterator, not a translator. It's just turning latin letters into hieroglyphs as you type them. I don't know how accurate the transliteration is.

It would be like coming up with a sequence of Chinese characters that sounds like an English sentence when pronounced by a Mandarin speaker. Nothing really to do with translation.


Another great example is https://www.blueridgejournal.com/poems/mots01-unpetit.htm

Getting a native French speaker to recite these to native English speakers is hilarious! Especially when the French speaker is trying to work out why what they've said is seemingly so funny.


Sir say "tray bean". Mercy.


> It would be like coming up with a sequence of Chinese characters that sounds like an English sentence when pronounced by a Mandarin speaker. Nothing really to do with translation.

Actually this does happen for some foreign terms/loan words, like the names of other countries:

https://en.wikipedia.org/wiki/Transcription_into_Chinese_cha...


Indeed, my understanding (which is backed up by your link) is that the hieroglyphs aren't just pictograms that try to draw the meaning but they tend to have particular pronunciations, and the selection of glyphs will usually depend on both the sound and the meaning of the word.

I guess Chinese characters work similarly, where eg, each character has a particular sound in Mandarin (with some characters having the same sound), but you spell words using certain characters based on the (sometimes historical) semantic association of components (radicals) within each character.

I'll admit I'm not an expert in either system, so sorry if either description seems like an oversimplification (I'm pretty sure there are exceptions in both cases).

This also leads to one of my favourite tables on Wikipedia [0], showing correspondences between various scripts, including Egyptian hieroglyphs and Arabic/Hebrew. Not all hieroglyphs are included, but you can see that each letter in Arabic/Hebrew ultimately derives from some hieroglyph which would have had a similar sound. The name of the Arabic letter ع sounds the same as the Arabic word for "eye" (ʿayn, عين) and the corresponding hieroglyph also looks like an eye.

[0] https://en.wikipedia.org/wiki/Phoenician_alphabet#Table_of_l...


https://thelanguagenerds.com/2023/literal-chinese-translatio...

You might then enjoy a list of these "accidental semantics" acquired by foreign country names, which are* rough transliterations, usually from local or English name.

I can't find the nice source I originally had, so here's a stochastic patrot's approximation:

  United States 美国 Měiguó Beautiful Country
  China 中国 Zhōngguó Middle Country
  Japan 日本 Rìběn Origin of the Sun
  Germany 德国 Déguó Virtuous Country
  India 印度 Yìndù India
  United Kingdom 英国 Yīngguó Heroic Country
  France 法国 Fǎguó Law Country
  Italy 意大利 Yìdàlì Italy
  Canada 加拿大 Jiānádà Canada
  South Korea 韩国 Hánguó Han Country
* Obviously CJK etc countries already had names


Most of those are just phonetic approximations using convenient characters. I'm not sure I'd say the names have any semantic content. The names for China, Korea, and Japan are the names the ancient Chinese gave them. China is the "middle" or "center" country because it's the country of the people who named it. Japan is the origin of the sun because it's to the East of China. And of course the Han are what Koreans called themselves. Nothing accidental about any of those names.


Sure, but there are multiple characters nearly every sound in Mandarin, I’m sure it’s no accident they ended up with mostly flattering ones.


+1, it's no accident. The most obvious case is with corporate names, which sometimes get carefully analyzed in translation. Coca-Cola is famously translated as 可口可乐 (Ke3 kou3 ke3 le4, numbers indicate tones), with individual characters meaning "can taste can happy", and intuitively meaning something like "drinkable deliciousness"


According to Baxter & Sagart (2014), the name Hán 韓 *[g]ˤar is from Gaya language word kara.


Indeed, that is what I wrote


And JavaScript .. And Python (though as sibling posts have mentioned it looks like they're intending to make a breaking change to remove it).

EDIT: actually, the PEP points out that they intend for it to only be a warning in CPython, to avoid the breaking change


Interesting .. from the post above:

> The projects examined contained a total of 120,964,221 lines of Python code, and among them the script found 203 instances of control flow instructions in a finally block. Most were return, a handful were break, and none were continue.

I don't really write a lot of Python, but I do write a lot of Java, and `continue` is the main control flow statement that makes sense to me within a finally block.

I think it makes sense when implementing a generic transaction loop, something along the lines of:

  <T> T executeTransaction(Function<Transaction, T> fn) {
    for (int tries = 0;; tries++) {
      var tx = newTransaction();
      try {
        return fn.apply(tx);
      } finally {
        if (!tx.commit()) {
          // TODO: potentially log number of tries, maybe include a backoff, maybe fail after a certain number
          continue;
        }
      }
    }
  }
In these cases "swallowing" the exception is often intentional, since the exception could be due to some logic failing as a result of inconsistent reads, so the transaction should be retried.

The alternative ways of writing this seem more awkward to me. Either you need to store the result (returned value or thrown exception) in one or two variables, or you need to duplicate the condition and the `continue;` behaviour. Having the retry logic within the `finally` block seems like the best way of denoting the intention to me, since the intention is to swallow the result, whether that was a return or a throw.

If there are particular exceptions that should not be retried, these would need to be caught/rethrown and a boolean set to disable the condition in the `finally` block, though to me this still seems easier to reason about than the alternatives.


> Having the retry logic within the `finally` block seems like the best way of denoting the intention to me, since the intention is to swallow the result, whether that was a return or a throw.

Except that is not the documented intent of the `finally` construct:

  The finally block always executes when the try block exits. 
  This ensures that the finally block is executed even if an 
  unexpected exception occurs. But finally is useful for more 
  than just exception handling — it allows the programmer to 
  avoid having cleanup code accidentally bypassed by a 
  return, continue, or break. Putting cleanup code in a 
  finally block is always a good practice, even when no 
  exceptions are anticipated.[0]
Using `finally` for implementing retry logic can be done, as you have illustrated, but that does not mean it is "the best way of denoting the intention." One could argue this is a construct specific to Java (the language) and does not make sense outside of this particular language-specific idiom.

Conceptually, "retries" are not "cleanup code."

0 - https://docs.oracle.com/javase/tutorial/essential/exceptions...


Sounds like the right intent to me. To pinpoint your existing quote from the documentation:

> The finally block always executes when the try block exits. This ensures that the finally block is executed even if an unexpected exception occurs.

The intent of the transaction code is that the consistency is checked (using `tx.commit()`) "even if an unexpected exception occurs".

I'm not sure how else to interpret that to be honest. If you've got a clearer way of expressing this, feel free to explain.


> The intent of the transaction code is that the consistency is checked (using `tx.commit()`) "even if an unexpected exception occurs".

A transaction failing is the opposite of an unexpected event. Transactions failing is a central use case of any transaction. Therefore it should be handled explicitly instead of using exceptions.

Exceptions are for unexpected events such as the node running out of memory, or a process failing to write to disk.


> A transaction failing is the opposite of an unexpected event.

That's why it's denoted by a non-exceptional return value from `tx.commit()` in my sample code. When I've talked about exceptions here, I'm talking about exceptions raised within the transaction. If the transaction succeeds, those exceptions should be propagated to the calling code.

> Exceptions are for unexpected events such as the node running out of memory, or a process failing to write to disk.

Discussing valid uses of exceptions seems orthogonal to this (should OOM lead to a catchable exception [0], or should it crash the process?). In any case, if the process is still alive and the transaction code determines without error that "yes, this transaction was invalid due to other contending transactions", it should retry the transaction. If something threw due to lack of memory or disk space, chances are it will throw again within a successful transaction and the error will be propagated.

[0] As alluded to in my first post, you might want to add some special cases for exceptions/errors that you want to immediately propagate instead of retrying. Eg, you might treat `Error` subtypes differently, which includes `OutOfMemoryError` and other cases that suggest the program is in a potentially unusable state, but this still isn't required according to the intent of the transactional logic.


> The intent of the transaction code is that the consistency is checked (using `tx.commit()`) "even if an unexpected exception occurs".

First, having a `commit` unconditionally attempted when an exception is raised would surprise many developers. Exceptions in transactional logic are often used to represent a "rollback persistent store changes made thus far" scenario.

Second, using a condition within `finally` to indicate a retry due to a `commit` failing could be expressed in a clearer manner by having it within the `try` block as described by IntelliJ here[0].

0 - https://www.jetbrains.com/help/inspectopedia/ContinueOrBreak...


> Exceptions in transactional logic are often used to represent a "rollback persistent store changes made thus far" scenario.

Handling can be added to change the transaction to be read-only if the inner code throws a particular exception, but the consistency should still be checked through a `commit` phase (at least in an OCC setting), so the `continue` in `finally` is still the correct way to do it.

> could be expressed in a clearer manner by having it within the `try` block as described by IntelliJ here[0].

> 0 - https://www.jetbrains.com/help/inspectopedia/ContinueOrBreak...

Wrong link? The only solution I see there is to add a comment to suppress the warning, which sounds fine to me (eg, analogous to having a `// fallthrough` comment when intentionally omitting `break` statements within `switch`, since I can agree that both of these things are uncommon, but sometimes desirable).


> Handling can be added to change the transaction to be read-only if the inner code throws a particular exception, but the consistency should still be checked through a `commit` phase (at least in an OCC setting), so the `continue` in `finally` is still the correct way to do it.

This approach fails to account for `fn` performing multiple mutations where an exception is raised from statement N, where N > 1.

For example, suppose `fn` successfully updates a record in table `A`, then attempts to insert a record into table `B` which produces a constraint violation exception[0]. Unconditionally performing a `commit` in the `finally` block will result in the mutation in table `A` being persisted, thus resulting in an invalid system persistent state.

If the `try` block performed the `commit` and the `finally` block unconditionally performed a `rollback`, then the behavior I believe sought would be sound.

>> 0 - https://www.jetbrains.com/help/inspectopedia/ContinueOrBreak...

> Wrong link?

No, it's the link I intended. The purpose of it was to provide the warning anyone working in a translation unit using the technique originally proffered would see as well as be a starting point for research.

0 - https://docs.oracle.com/en/java/javase/17/docs/api/java.sql/...


Doesn't that code ignore errors even if it runs out of retries? Don't you want to log every Exception that happens, even if the transaction will be retried?

This code is totally rotten.


A result of an inconsistent transaction should be discarded whether it's a return value or a thrown exception. If it runs out of tries another error should be thrown. This should only happen due to contention (overlapping transactions), not due to a logical exception within the transaction.

You can add extra logging to show results or exceptions within the transaction if you want (for the exception this would simply be a `catch` just before the `finally` that logs and rethrows).

I've omitted these extra things because it's orthogonal to the point that the simplest way to express this logic is by having the `continue` control flow unconditional on whether the code was successful .. which is what you use `finally` for.

If you did this in Rust noone would complain, since the overall result is expressed as a first-class `Result<T, E>` value that can naturally be discarded. This is why Rust doesn't have `finally`.

Rust is also a lot more permissive about use of control flow, since you can write things like `foo(if x { y } else { continue }, bar)`.

Personally, I prefer when the language gives a bit more flexibility here. Of course you can write things that are difficult to understand, but my stance is still that my example code above is the simplest way to write the intended logic, until someone demonstrates otherwise.

I don't think this is a restriction that generally helps with code quality. If anything I've probably seen more bad code due to a lack of finding the simplest way to express control flow of an algorithm.

I'm sure there's some train of thought that says that continue/break/return from a loop is bad (see proponents of `Array.prototype.forEach` in JS), but I disagree with it.


> if (!tx.commit())

https://docs.oracle.com/javase/8/docs/api/java/sql/Connectio...:

  void commit()
     throws SQLException
⇒ this code I won’t even compile for the java.sql.Transaction” class that is part of the Java platform.

(I think having commit throw on error is fairly common. Examples: C# (https://learn.microsoft.com/en-us/dotnet/api/system.data.sql...), Python (https://docs.python.org/3/library/sqlite3.html#sqlite3.Conne...))


I wasn't thinking of JDBC SQL transactions specifically, but sure, different APIs can denote retriable transaction failures differently. Instead of:

  if (!tx.commit()) { continue; }
you do:

  try { conn.commit(); } catch (SQLTransactionRollbackException e) { continue; }
and the principle still applies. The simplest solution still involves a `continue` within the `finally` block.

Whether it's a good idea to actually do this directly using SQL connections is another question .. SQL databases usually use pessimistic locking, where the transaction failures are actually "deadlocks" that are preferably avoided through careful ordering of operations within the transaction (or more commonly, YOLOing it using an isolation level that allows read anomalies). Without going into all the details, this probably has a large influence over the design of the SQL APIs you're referring to.


This code is wrong. You don't want to commit a transaction if an exception is thrown during the transaction.


You want to at least check that the exception was raised in the absence of read anomalies. The check for read anomalies in OCC happens during the commit phase.

Setting a transaction to read-only on error is possible using the code (using a rethrowing catch within the transaction), but this is not universally desirable.

If you're using transactions to run fairly arbitrary code atomically (assuming no static state outside of the transaction), the expected behaviour would be that modifications leading up to an exception (in a non-anomalous transaction) are still persisted. Eg, imagine the code within the transaction is updating a persisted retry counter before performing a fallible operation. In this case you want the counter to be updated so that the transaction doesn't just fail an infinite number of times, since each time you roll back on error you're just restoring the state that leads to the error.

Another case would be where the exception is due to something that was written within the transaction. If the exception were raised but the writes were not persisted, it would at least be confusing seeing the exception, and possibly logically incorrect depending on how the irrelevant exception is handled (since it's due to data that theoretically doesn't exist).


Doesn't look difficult: https://www.fbi.gov/wanted/seeking-info/ballot-box-fires (yes, that's in Oregon)


I’m not sure what’s so special in Oregon’s ballot boxes. But, tampering that is detected (don’t need much special to detect a burning box I guess!) is not a complete failure for a system. If any elections were close enough for a box to matter, they could have rerun them.


Here in NZ when I've been to vote, there are usually a couple of party affiliates at the voting location, doing what one of the parent posts described:

> You can stay there and wait for the count at the end of the day if you want to.

And if you watch the election night news, you'll see footage of multiple people counting the votes from the ballot boxes, again with various people observing to check that nothing dodgy is going on.

Having everyone just put their ballots in a postbox seems like a good way remove public trust from the electoral system, because noone's standing around waiting for the postie to collect the mail, or looking at what happens in the mail truck, or the rest of the mail distribution process.

I'm sure I've seen reports in the US of people burning postboxes around election time. Things like this give more excuses to treat election results as illegitimate, which I believe has been an issue over there.

(Yes, we do also have advanced voting in NZ, but I think they're considered "special votes" and are counted separately .. the elections are largely determined on the day by in-person votes, with the special votes being confirmed some days later)


> I'm sure I've seen reports in the US of people burning postboxes around election time

Yeah that happened once in OR then got re-plastered all over the news dozens of times. I'm sure you can find way more incidents of intimidation, fighting, long lines and other issues for in-person voting. But individual incidents does not mean that there is anything wrong with a system that has worked for decades in multiple states.


Just to clarify, I think the parent posts are talking about non-failing page faults, ie where the kernel just needs to update the mapping in the MMU after finding the existing page already in memory (minor page fault), or possibly reading it from filesystem/swap (major page fault).

SIGSEGV isn't raised during a typical page fault, only ones that are deemed to be due to invalid reads/writes.

When one of the parents talks about "no good programming model/OS api", they basically mean an async option that gives the power of threads; threading allows concurrency of page faults, so the kernel is able to perform concurrent reads against the underlying storage media.

Off the top of my head, a model I can think of for supporting concurrent mmap reads might involve a function:

  bool hint_read(void *data, size_t length);
When the caller is going to read various parts of an mmapped region, it can call `hint_read` multiple times beforehand to add regions into a queue. When the next page fault happens, instead of only reading the currently accessed page from disk, it can drain the `hint_read` queue for other pages concurrently. The `bool` return indicates whether the queue was full, so the caller stops making useless `hint_read` calls.

I'm not familiar with userfaultfd, so don't know if it relates to this functionality. The mechanism I came up with is still a bit clunky and probably sub-optimal compared to using io_uring or even `readv`, but these are alternatives to mmap.


You’ve actually understood my suggestion - thank you. Unfortunately I think hint_read inherently can’t work because it’s a race condition between the read and how long you access the page. And this race is inherent in any attempted solution that needs to be solved. Signals are also the wrong abstraction mechanism (and are slow and have all sorts of other problems).

You need something more complicated I think, like rseq and futex you have some shared data structure that both understand how to mutate atomically. You could literally use rseq to abort if the page isn’t in memory and then submit an io_uring task to get signaled when it gets paged in again but rseq is a bit too coarse (it’ll trigger on any preemption).

There’s a race condition starvation danger here (it gets evicted between when you get the signal and the sequence completes) but something like this conceptually could maybe be closer to working.

But yes it’s inherently difficult which is why it doesn’t exist but it is higher performance. And yes, this only makes sense for mmap not all allocations so SIGSEGV is irrelevant if looking at today’s kernels.


If you want accessing a particular page to cause a SIGSEGV so your custom fault handler gets invoked, you can just munmap it, converting that access from a "non-failing page fault" into one "deemed to be invalid". Then the mechanism I described would "allow[] concurrency of page faults, so the [userspace threading library] is able to perform concurrent reads against the underlying storage media". As long as you were aggressive enough about unmapping pages that none of your still-mapped pages got swapped out by the kernel. (Or you could use mlock(), maybe.)

I tried implementing your "hint_read" years ago in userspace in a search engine I wrote, by having a "readahead thread" read from pages before the main thread got to them. It made it slower, and I didn't know enough about the kernel to figure out why. I think I could probably make it work now, and Linux's mmap implementation has improved enormously since then, so maybe it would just work right away.


The point about inducing segmentation faults is interesting and sounds like it could work to implement the `hint_read` mechanism. I guess it would mostly be a question of how performant userfaultfd or SIGSEGV handling is. In any case it will be sub-optimal to having it in the kernel's own fault handler, since each userfaultfd read or SIGSEGV callback is already a user-kernel-user switch, and it still needs to perform another system call to do the actual reads, and even more system calls to mmap the bits of memory again.

Presumably having fine-grained mmaps will be another source of overhead. Not to mention that each mmap requires another system call. Instead of a single fault or a single call to `readv`, you're doing many `mmap` calls.

> I tried implementing your "hint_read" years ago in userspace in a search engine I wrote, by having a "readahead thread" read from pages before the main thread got to them.

Yeah, doing it in another thread will also have quite a bit of overhead. You need some sort of synchronisation with the other thread, and ultimately the "readahead" thread will need to induce the disk reads through something other than a page fault to achieve concurrent reads, since within the readahead thread, the page faults are still synchronous, and they don't know what the future page faults will be.

It might help to do `readv` into dummy buffers to force the kernel to load the pages from disk to memory, so the subsequent page faults are minor instead of major. You're still not reducing the number of page faults though, and the total number of mode switches is increased.

Anyway, all of these workarounds are very complicated and will certainly be a lot more overhead than vectored IO, so I would recommend just doing that. The overall point is that using mmap isn't friendly to concurrent reads from disk like io_uring or `readv` is.

Major page faults are basically the same as synchronous read calls, but Golang read calls are asynchronous, so the OS thread can continue doing computation from other Goroutines.

Fundamentally, the benchmarks in this repository are broken because in the mmap case they never read any of the data [0], so there are basically no page faults anyway. With a well-written program, there shouldn't be a reason that mmap would be faster than IO, and vectored IO can obviously be faster in various cases.

[0] Eg, see here where the byte slice is assigned to `_` instead of being used: https://github.com/perbu/mmaps-in-go/blob/7e24f1542f28ef172b...


Inducing segmentation faults is literally how the kernel implements memory mapping, and virtual memory in general, by the way. From the CPU's perspective, that page is unmapped. The kernel gets its equivalent of a SIGSEGV signal (which is a "page fault"=SIGSEGV "interrupt"=signal), checks its own private tables, decides the page is currently on disk, schedules it to be read from disk, does other stuff in the meantime, and when the page has finished being read from disk, it returns from the interrupt.

(It does get even deeper than that: from the CPU's perspective, the interrupt is very brief, just long enough to take note that it happened and avoid switching back to the thread that page-faulted. The rest of the stuff I mentioned, although logically an "interrupt" from the application's perspective, happens with the CPU's "am I handling an interrupt?" flag set to false. This is equivalent to writing a signal handler that sets a flag saying the thread is blocked, edits its own return address so it will return to the scheduler instead of the interrupted code, then calls sigreturn to exit the signal handler.)


There are some differences, including the cross-CPU TLB shootdowns vlovich mentioned.


munmap + signal handling is terrible not least of which that you don’t want to be fucking with the page table in that way as an unmap involves a cross cpu TLB shoot down which is slooow in a “make the entire machine slow” kind of way.


That is correct, although my laptop is only four cores.


Are you reinventing madvise?


I think the model I described is more precise than madvise. I think madvise would usually be called on large sequences of pages, which is why it has `MADV_RANDOM`, `MADV_SEQUENTIAL` etc. You're not specifying which memory/pages are about to be accessed, but the likely access pattern.

If you're just using mmap to read a file from start to finish, then the `hint_read` mechanism is indeed pointless, since multiple `hint_read` calls would do the same thing as a single `madvise(..., MADV_SEQUENTIAL)` call.

The point of `hint_read`, and indeed io_uring or `readv` is the program knows exactly what parts of the file it wants to read first, so it would be best if those are read concurrently, and preferably using a single system call or page fault (ie, one switch to kernel space).

I would expect the `hint_read` function to push to a queue in thread-local storage, so it shouldn't need a switch to kernel space. User/kernel space switches are slow, in the order of a couple of 10s of millions per second. This is why the vDSO exists, and why the libc buffers writes through `fwrite`/`println`/etc, because function calls within userspace can happen at rates of billions per second.


you can do fine grained madvise via io_uring, which indeed uses a queue. But at that point why use mmap at all, just do async reads via io_uring.


The entire point I was trying to make at the beginning of the thread is that mmap gives you memory pages in the page cache that the OS can drop on memory pressure. Io_uring is close on the performance and fine-grained access patterns front. It’s not so good on the system-wide cooperative behavior with memory front and has a higher cost as either you’re still copying it from the page cache into a user buffer (non trivial performance impact vs the read itself) + trashing your CPU caches or you’re doing direct I/O and having to implement a page cache manually (and risks duplicating page data inefficiently in userspace if the same file is accessed by multiple processes.


Right, so zero copy IO but still having the ability to share the pagecache across process and allow the kernel to drop caches on high mempressure. One issue is that when under pressure, a process might not really be able to successfully read a page and keep retyring and failing (with an LRU replacement policy it is unlikely and probably self-limiting, but still...).


To take advantage of zero-copy I/O, which I believe has become much more important since the shift from spinning rust to Flash, I think applications often need to adopt a file format that's amenable to zero-copy access. Examples include Arrow (but not compressed Feather), HDF5, FlatBuffers, Avro, and SBE. A lot of file formats developed during the spinning-rust eon require full parsing before the data in them can be used, which is fine for a 1KB file but suboptimal for a 1GB file.


> I can say that the display & overall performance is noticeably faster on the two actual computers I tested on than under qemu on my Linux system.

You'd probably want to use `-enable-kvm` so it's not doing full software emulation. Assuming you're running this on another x86-64 machine.


Thanks; I'll start doing that when running on Linux. (I had missed this, as my main dev box is running OpenBSD which doesn't have kvm)


I'm surprised to see that tcg (TCG, the tiny code generator, the original QEMU) is still the default accel, even when KVM could be used instead.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: