Also: 1) Maximum # of open file descriptors 2) Whether your slave DB stopped rep...

contingencies · on Feb 10, 2014

6) Free inodes (as distinct from space) per filesystem.

caw · on Feb 10, 2014

Similar to free inodes, you should also check for maximum number of directories. dir_index option helps, but I've seen it become a problem.

mnw21cam · on Feb 10, 2014

There's a maximum number of directories? On what filesystem is that?

caw · on Feb 10, 2014

ext3 without dir_index has a limit of 32K directories in any one directory.

Where I saw it crop up was 32K folders under /tmp on a cluster system. So no it's not a limit on number of directories entirely (that's inodes), but rather how many subdirectories you can have.

http://en.wikipedia.org/wiki/Ext4#Features <-- Fixes 32K limit

otterley · on Feb 10, 2014

ext3/4 has really poor large-directory performance, even with dir_index, especially if you are constantly removing and readding nodes. I would highly recommend XFS for large-directory use cases.

0x0 · on Feb 11, 2014

I got bit by this once, i think it was related to a maximum of 32k hardlinks per inode, which effectively sets a limit of 32k subdirs since each subdir has a hardlink to ".."

Gracana · on Feb 10, 2014

> Maximum # of open file descriptors

Augh. I ran one of my servers hard into that wall, and now it's something I watch. At least I learned from that mistake.

apaprocki · on Feb 10, 2014

Related to this, if you've ever built/run anything on Solaris, you probably found out the hard way that even in modern times, fdopen() in 32-bit apps only allows up to 255 fds because they oh so badly want to preserve ages old ABI. Funny bug to hit at runtime in production when you aren't aware of this compatibility "feature".

wtracy · on Feb 10, 2014

I learned the hard way that MySQL creates a file descriptor for every database partition you create. Someone had a script that created a new partition every week...

pbhjpbhj · on Feb 10, 2014

So after 5000 years you were running out?

wtracy · on Feb 10, 2014

I forget the details, but practically speaking the database keeled over after some 200 or 500 files were open at the same time.

teddyh · on Feb 11, 2014

X) Number of cgroups. We were getting slow performance, apparently related to slow IO, but nothing stood out as being the culprit. Turns out, since vsftpd was creating cgroups and not removing them, the pseudo-filesystem /sys/fs/cgroup had myriads of subdirectories (each representing a cgroup), and whenever something wanted to create a new cgroup or access the list of cgroups, this counted as listing that pseudo-directory, which counted as IO.

Fixed by using the undocumented option isolate_network=NO in vsftpd.conf.

DrJ · on Feb 11, 2014

Feels like this list (and the original post) are problems caused by:

* lack of proper/default monitoring advocated for your tools (2), (4).

* Choosing poor (default/recommended) settings (1), (4).

* Keeping stateless server/instances when you don't need to (5), (6).

* Not tracking performance as part of monitoring (3), (4)

Albeit, I have made the same mistakes too.

edit: formatting