Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This article is wrong, just wrong.

I would love it if there were just one kind of storage, and my code could ignore the distinction between disk and memory. But it can't, for three reasons: 10 ms seek times, RAM that is much smaller than disk, and garbage collection.

10 ms seek times mean that fast random access across large disk files just isn't possible. There is a vast amount of literature and research devoted to getting over this specific limitation. And it isn't old, either: all of the recent work on big data is aimed at resolving the tension between sequential disk access, which is fast, and random access, which is required for executing queries.

RAM that is smaller than disk means that virtual disk files don't work very well when you have large data files. If you try to map more than the amount of physical RAM you get a mess: http://stackoverflow.com/questions/12572157/using-lots-of-ma...

Garbage collection means that it is easy to allocate a bit of memory, and then let it go when the reference goes out of scope. There's no need to explicitly deallocate it. It's one of the things that makes modern programming efficient. With disk, you don't get that; if you write something, you've got to erase it or disk fills up.

In short, this guy's casual contempt for "1975 programming" is irksome, because it's clear that he isn't working on the same class of problems that the rest of us are. He may be able to get away with virtual memory for his limited application, but the rest of us can't.



(1) Varnish exists, so we can actually run it and analyze its performance. There's no need for "this can't work because X" arguments because we know whether it can work or not.

The author claims Varnish works with huge mappings too. In another article: "For example, Varnish does not ignore the fact that memory is virtual; it actively exploits it. A 300-GB backing store, memory mapped on a machine with no more than 16 GB of RAM, is quite typical."

(2) Varnish doesn't ignore the fact disk is slower than RAM. Its data structures are built to minimize page faults, and thus seeks, for this reason. See also: http://queue.acm.org/detail.cfm?id=1814327

The virtual memory abstraction leaks, just like every other abstraction. That doesn't make it worthless.

(3) Files aren't append-only: you can reuse space for a different purpose when you don't need it for its original purpose anymore. How do you think databases work? Or filesystems?

(4) The author's not talking about using disk-backed memory for your general purpose heap. He's talking about using the virtual memory system to access a giant cache on disk.

So is the author wrong about everything? Varnish seems to work, so if he's wrong he's getting away with it.


I can not agree more with you. The author just has no clue on memory management. He is imaging that the virtual memory model can solve all cache pains. The reality is much painful.


What is a "virtual disk file"?


I think he is referring to mmap? Not entirely sure though...


Perhaps conflating swap files with virtual memory?


mmap makes use of the virtual memory system. The author does not appear to be conflating anything.


Yeah, that would be great if a. the link was to a question on StackExchange about mmap, but it's not, and b. if the use of mmap was commonly known as using a "virtual disk file", but again, it's not.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: