Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Would the HN software need to scan the directory e.g. to read in all users, at any point? I don't know the source code of HN but I can't see why that would be necessary.

(And if it did need to, presumably it's now need to recursively scan all sub-directories, which would also take a while?)



No, you need to scan the directory every time you read a file. So most filesystems do a lot of work to optimize this but it is still a significant factor.


Not sure I agree with that. I ran the software for community with 6M users, and on 2003 hardware we had millions of files in one directory. That was with advfs on tru64 so things might be different with other file systems. But e.g. zfs can do this no problem as well. I just sort of assumed other FSs must have caught up in the intervening 10+ years but I haven't looked into their source code so an prepared to admit I might be wrong about them.

The same way a database fetch doesn't load the whole table, filesystems can and do use trees and hashes to organize directories so that file lookup, creation and deletion by name can be fast and can be concurrent.

I posted this in another comment, but this was my understanding of the situation in 2010. http://www.databasesandlife.com/flat-directories/

There must be a reason why they did this change, either I am wrong about performance (perhaps my results really were particular to those filesystems) or perhaps I am right and they made the change for another reason. I'd like to learn the answer.


I don't think that fact that advfs on tru64 did well is any evidence that other filesystems are not dealing with this poorly. I'm running an XFS filesystem right now that still totally sucks at this particular aspect but I'm loathe to move all the data off the machine, rebuild it all and then to move it back.

For one it would need a vast amount of temp space, the site would be down while doing it and the end result would be much the same as what it is today (I rarely modify the filesystem).


You'd want to do so to check for dup usernames, for instance.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: