I encountered this issue a year or two ago on the Netflix CDN nodes when running with 100K TCP connections or more. We have metrics collectors that need to report how many TCP connections are open, and what state they are in (ESTABLISHED, TIME_WAIT, etc). Until recently, the only way to do this was to ask the FreeBSD kernel to copy out the entire TCP hash table to userspace, where an application can examine it & count connections. The problem with that is that the thread wanting a copy of the table had to take an rlock on the table while it was being copied. As the author points out, this blocks any writers, which in turn blocks more readers. So we'd see latency spikes when this script was running.
The problem has been fixed twice: by moving the top-level locking for that table to epoch (FreeBSD RCU equivalent), and also by using per-cpu counters to track the number of TCP connections in every state. So now the user-space collector just needs to fetch a small array of tcp states from the kernel.
The problem has been fixed twice: by moving the top-level locking for that table to epoch (FreeBSD RCU equivalent), and also by using per-cpu counters to track the number of TCP connections in every state. So now the user-space collector just needs to fetch a small array of tcp states from the kernel.