Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

That was exactly what my thoughts were too: letting the OS do all the memory management, caching, is a strategy many great projects use, among which PostgreSQL and Varnish.

However, I do feel there is something "wrong" about the approach MongoDB is taking. They need to allocate new files in huge buffers, which completely take up all I/O while being filled with zeroes. There is no logical hierarchy in the files, and it just feels a bit weird.

Perhaps they should've taken the approach PostgreSQL did, which is to simply use files and read from them instead of using mmap. The whole reason they went for a global lock instead of more granular lock is because the whole mmap'ed area is one big blob, and it was the most "obvious" approach.



Thanks for the insight on the global write lock. I've searched and searched and wasn't able to find anything on why they have the global lock.

Out of curiosity, is there a simple way to explain why someone would mmap instead of just reading files directly (I've never done any programming with mmap, so I'm a bit ignorant of its use cases)?


This SO entry seems to answer your question fairly well: http://stackoverflow.com/questions/258091/when-should-i-use-...


mmap also blows-up the TLB cache by taking up so many page addresses.


It is a widespread myth that postgresql lets the OS do all the memory management and caching. I don't understand why it is so prevalent though considering it is so trivial to look and see that it is nonsense. Postgresql reads all data into a shared buffer cache. The data is stored in files, so of course the filesystem buffer cache is also used, but the idea that postgresql leaves it entirely up to the OS is totally false. It has its own cache as well on top of the filesystem cache.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: