Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> …with the right database, no system calls at all.

How does that work? Doesn't the database talk to the filesystem? Aren't there a bunch of syscalls going on there?



Serious database can use raw partitions with no filesystem for storage. Even when storing data on a filesystem a database is unlikely to be using a single file for each entry; the database might make one mmap system call when it starts, and none thereafter (simplified example). The point is that the database can do O(1) system calls for n queries, whereas using the filesystem with a separate file for each entry you're going to need at O(n) system calls.

You could of course avoid this problem by using a single large file, but that has its own problems (aforementioned possibility of corruption). Working around those problems probably amounts to embedding a database in your application.


In the read-only case, pretty much any embedded DB with a large userspace cache configured won't read data back in redundantly.

In the specific case of LMDB, this is further extended since read transactions are managed entirely in shared memory (no system calls or locks required), and the cache just happens to be the OS page cache.

Per a post a few weeks back, the complete size of the HN dataset is well under 10GB, it comfortably fits in RAM.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: