Some random selections from my notes on building Linode web servers: - Set up re...

epic9x · on Oct 6, 2012

Blindly turning off KeepAlive isn't a recipe for awesomeness.

KeepAlive is a trade for memory/cpu time; on if you want to speed up in exchange for memory, off if you want to conserve memory and pay the cost of initializing new conections.

If you're doing anything like loading a web page with more than a few images, KeepAlive will likely improve connection time as http requests things serially, and you'll have a speed up by re-using the same connection.

Protip: run apache's mpm-worker with keepalive on and save memory and speed up your site.

thaumaturgy · on Oct 6, 2012

It just so happens that I have the very best test site possible for this argument: one of my alpha customers has a website built by Not-A-Professional with an incredible 156 requests and 2.39MB of data for the landing page. It is exactly the opposite of an optimized website.

So I tested the page load with KeepAlive on and KeepAlive off. I `/etc/init.d/apache2 graceful`'d before each test, and I cleared Chrome's cache before each test. I have nothing else running on my system right now and our pipe is nice and clear. I have a 92ms ping from the office to my web server.

KeepAlive On: 5.31s

KeepAlive Off: 6.87s

1.5 seconds is an appreciable difference. It works out to roughly 10ms per request, unless it's so late that the math part of my brain has quit for the night. If Yahoo.com were loaded from my web server, having KeepAlive on would delay total page load by about 750ms -- a noticeable difference. (However, given that it takes over 4 seconds to reach page load on yahoo.com for 70+ requests, their page might actually load faster if it were hosted on my server...)

Now here's the kicker: that same customer site has been featured on a popular radio show, twice, driving over 6 million hits over 24 hours, with peak traffic at 40Mb/s, thousands of simultaneous connections, and 100+ requests per second, sustained for several hours.

The site's loading time during that period was utterly unaffected. Seriously, never even a blip. (I stayed up during the whole period just to personally watch it.)

I wasn't about to experiment with turning on KeepAlive during all that fun, but I'm reasonably confident that having KeepAlive turned on would have crushed that server, given that my customer's site was receiving substantially more traffic than even Pinboard did immediately after the Delicious announcement (http://blog.pinboard.in/2011/03/anatomy_of_a_crushing/) -- traffic that Pinboard's founder notes will usually kill an unprepared site (or server).

And keep in mind, this was all on a memory-constrained tiny little vm.

So here's my final statement on KeepAlive, which I will link back to from now on: if, as a server admin, you're turning KeepAlive on to speed up your site, then you are optimizing your server and your site in exactly the wrong way.

epic9x · on Oct 6, 2012

You're not providing any data to say having KeepAlive on would've 'crushed' the server, actually quite the opposite. For example - with KeepAlive on, your site response time was 5.31s. With KeepAlive off, it was 6.87s. So with your own test it was slower to load.

With keepalive off, your server would spawn new connections for every request rather than re-using them. With mpm-worker and threads, that's not so bad. However I've seen a lot of people run pre-fork which uses processes instead of threads which can be a Bad Time(tm) with heavy spikes in traffic.

Anyway - found this here (not my site) - explains it fairly well: http://abdussamad.com/archives/169-Apache-optimization:-Keep...

thaumaturgy · on Oct 6, 2012

From one of my other replies to the KeepAlive discussion: http://news.ycombinator.com/item?id=2588783

I promise you my little server was getting more traffic than that, and since patio11 -- another vocal "turn off KeepAlive" advocate -- personally commented in that thread that turning off KeepAlive directly resolved their site outage, I think I'm going to stick with what I've got.

The server is already using mpm-worker, as well as fcgid carefully tuned to make PHP nice and snappy without eating all of the server's available memory.

Final point: site response times aren't the only metric a server admin should care about. A server admin should also -- and maybe even primarily -- care about concurrent connections. If I can serve thousands of simultaneous connections at 6.87s or hundreds of simultaneous connections at 5.31s, guess which one I choose?

gambler · on Oct 6, 2012

1.5 seconds is a lot of time to save off page load and this shouldn't be just ignored. Other people do ridiculous things to their architecture to shave off just a dozen milliseconds no one cares about.

I mean, KeepAlive shouldn't crash severs. Have you tried tuning other settings, like KeepAliveTimeout? It's either a config issue or general Apache issue, but it should be solvable in both cases. There is no unsolvable problem that would prevent a server from persisting connections under lighter load and dropping them when there is more traffic.

thaumaturgy · on Oct 6, 2012

I liked your other comment on this better, fwiw.

I'm trying pretty hard not to get defensive here, so before I respond, I need to point out that I originally posted a comment intended to help people get a better Apache config; I didn't claim anywhere that it was the perfect one, or that there weren't better ways to go about it. I posted the comment to a thread started by a post clearly aimed at total newbies (and SEO), which is why my comment stuck mostly to really really simple stuff, and didn't include other notes, like changing the server's file descriptor limit. Now I'm letting myself get dragged in to a debate over, frankly, minutiae that doesn't belong under a post about "setting up your first web server". I work all the time on a mind-blowingly wide array of technologies, yet I never talk about it because

I

hate

this

shit.

So much.

So, that said: if my customer wants to shave 1.5 seconds off of their page load times, the very first thing they need to do is (drumroll...) _not_have_156_resources_on_their_landing_page_. We're not talking about 1.5 seconds for the average Wordpress site here (54 requests, based on a Wordpress blog picked at semi-random). So let's just be clear that we're talking about a 10ms delay per request over a connection with a 96ms ping.

If they did that, and still weren't happy with the results, then the next thing I would do is pay for a bigger VPS instance.

If I did that, and they still weren't happy with the results, then the next thing I would do is move their site to a server in a data center closer to their geographic location.

If I did that, and they still weren't happy, then the next thing I would do is charge them a brand new premium rate of about 10x what they're paying now, and I would re-configure my PowerDNS servers to do geoip-based results and I would set up nginx proxies (all of which I intend to do eventually anyway).

If I did that, and they still weren't happy, I still wouldn't bother farting around with KeepAlive, because at this point I've entirely removed Apache from the page load equation because doing all of this is easier than screwing around with the Apache config any more than I already have.

Please don't get me wrong. I think performance is really really important. It's a frequent soapbox point for me. I have already sunk a ton of time into making my server setups secure and fast and reliable. I will continue to sink more completely irretrievable time into that. However, as geeks it's easy for us to lose sight of what's actually important: should I spend more time trying to remove a 10ms delay per request without sacrificing reliability, or should I answer some more support requests and get another job done on time?

I also have to confess that part of my problem is that I don't trust any of the web server stress testing tools that I've used so far. Like, at all. It's possible that I'm too dumb to use them, but I can't find one (even a paid service) that I can use and say to myself, "Yeah, so that's what would happen if the website got on the front page of Reddit." Without some kind of really solid tools to use for testing, I'm disinclined to try to squeeze 10ms of extra performance out of the server and take it right up to the hairy edge of what a VPS can do. If you have a recommendation for a stress tester that you feel does a good job of emulating front-page-on-Reddit syndrome, by all means, please share. I'd love to try it, turn some more knobs, and maybe even write a post titled "the best Apache-on-Linode config for surviving the front page of Reddit".

sandGorgon · on Oct 6, 2012

interesting.

do you mind pastebin-ning your apache conf and your mysql conf ? It would be nice to see what parameters are you tweaking.

Which VPS hosting are you using ?

thaumaturgy · on Oct 6, 2012

I'm using Linode. I can't say enough good things about them, btw. Their support is mind-blowingly good, and their service isn't too far behind.

I have zero incentive to post my config, sorry. One, it would take me an hour or so to format it for general consumption; two, it would take me many many hours more to justify every single setting in it to anybody who read it.

We've got a website overhaul in the works, which will feature a blog which will actually get used & updated, and one of the first items on there will be our server configurations. Promise.

sounds · on Oct 6, 2012

Two other suggestions would be:

1. Test how your VPS comes back after a reboot. When you make big changes and at least every 6 months due to all the upgrading that ubuntu does by default it can break the bootup process and you won't know until that emergency unscheduled reboot at 3AM.

2. As long as you're customizing the firewall you should block pings entirely.

* Really, since the distributions are very compatible I would urge you to consider a distro that has selinux enabled by default. Fedora Core is a great place to start. It also has better tools to manage security and gives you good resume skills.

thorduri · on Oct 6, 2012

>> 2. As long as you're customizing the firewall you should block pings entirely.

Why?

I've never seen a threat model where filtering icmp doesn't end up being more trouble then it's worth. Then there is even the maintenance headache when basic but powerful tools like ping and traceroute are rendered useless.

It's the same BS as with fail2ban, thankfully the OP wasn't spreading the gonorrhea of port knocking/single packet auth. Lock down your sshd like everything else: Disable root, disable tunneled cleartext passwords, enforce proper key usage, use AllowGroups/AllowUsers.

EDIT: The BS with fail2ban is moving the ssh port around. I understand this being a problem, but surely iptables has something similar to OpenBSD pf's:

block in quick from <brutes>

pass in log on $if_ext proto tcp from any to ($if_ext:0) port ssh keep state \ (max-src-conn 3, max-src-conn-rate 4/32, overload <brutes> flush global)

lesaker · on Oct 6, 2012

Mind explaining your issue with single packet auth? Port knocking by definition is an extra layer of security through obscurity, in that gaining access requiring hitting the sequence of ports to "knock". Depending on the implementation, this either is very vulnerable to MITM/sniffing attacks (static knock sequence) or quickly gets really complicated (fully dynamic knock sequence).

SPA, by contrast, uses actual crypto to securely authenticate the user with the server. I'm a fan of fwknop, which uses GPG to sign a request packet which is read and understood by the server. It protects against 0 day attacks on OpenSSH, lets me drop 22 inbound to eliminate all those pesky attackers, and allows me to securely authenticate with the fwknopd.

thorduri · on Oct 6, 2012

Sure.

> "Port knocking by definition is an extra layer of security through obscurity"

Security through obscurity is not security, it's at best theater.

> "It protects against 0 day attacks on OpenSSH"

You substitute one problem for another, 0 day attacks on the SPA. Your model isn't safer, it's just different.

And personally, I trust the OpenSSH guys way more then any SPA vendor simply because they have a very good track record.

moe · on Oct 8, 2012

You're making no sense. Moving the SSH port is a trivial way to reduce your attack surface (undirected bulk scans go for 22).

thorduri · on Oct 8, 2012

If undirected bulk scans are a serious threat to your security, something is up.

Properly configured (AllowUsers, Disable root, no clear text passwords only keys etc), I'd say that the undirected bulk scans pose no security risk at all, they are only a nuisance in terms of spamming your logs, which is easy enough to deal with.

What I'm really trying to say is that each "trivial way to reduce your attack surface" has both cost and benefits.

I'm contending that moving the ssh port around gives you the benefit of less log spam with no security gain, and costs in terms of documentation and maintenance.

When I do this Cost/Benefit analysis, I conclude that moving the port around has more costs then it does benefits, so I don't bother.

moe · on Oct 8, 2012

undirected bulk scans pose no security risk at all

A future bulk-scan may leverage a new SSH-exploit before you know it exists.

To put it explicitly: You should disable passwords and change the SSH-port. That's the two measures that make sense, to reduce surface and prevent password brute-force.

The rest of your recommendations is security theatre. An attacker dedicated enough to find your SSH-port and be set back by !AllowRoot will just brute-force an allowed username - if that's even a prerequisite for the given ssh-exploit.

thorduri · on Oct 8, 2012

> A future bulk-scan may leverage a new SSH-exploit before you know it exists.

Sure, this is true. I consider this a "minor" issue, truth be told (I didn't want muddle up the conversation) I don't tend to run sshd faced towards the 'public' internet and in the cases where I do, ssh access is restricted to certain hosts/networks, and is enforced by a firewall.

> The rest of your recommendations is security theatre

Can you state why? I think they all provide measurable/real benefit, if this isn't the case I'd welcome some education.

Hm. I will give you that AllowUsers,AllowGroups is not a very good benefit in this case, I mainly enforce the usage of those directives to protect against problems such as bogus user account creations (exploit created or something simple as a admin mistake).

>An attacker dedicated enough to find your SSH-port

And Now for Something Completely Different.

Protecting against a dedicated attacker is a totally different ball game then protecting against drive-by's.

darklajid · on Oct 6, 2012

Why would you block icmp ping?

I never understood that practice, especially not for standalone machines (in contrast to company networks, on the router).

So, why really?

__alexs · on Oct 6, 2012

I commonly see advice to block all ICMP traffic which is even crazier.

I think perhaps the general suspicion of ICMP might be related to things like this http://en.wikipedia.org/wiki/Ping_of_death which are now mostly irrelevant.

Some sort of naive attempt to stop attackers mapping your network too perhaps? It's not exactly a high tech (or effective) means of intrusion detection though.

nigma · on Oct 5, 2012

Why would you turn off KeepAlive instead of setting it to a small value?

thaumaturgy · on Oct 6, 2012

Honestly, I haven't done serious high-traffic testing to see if there's a magic number for KeepAlive. However, I've had enough traffic to the server on a couple of occasions that Apache would've fallen over if KeepAlive were on at all, since you can't specify KeepAliveTimeout in increments of less than 1 second.

There are also quite a few Apache-tuning articles on the web that feature "turn off KeepAlive!" in big bold letters, not to mention previous discussion on HN: http://news.ycombinator.com/item?id=2588783, http://news.ycombinator.com/item?id=1980278, http://news.ycombinator.com/item?id=1875848, http://www.kalzumeus.com/2010/06/19/running-apache-on-a-memo... ...

pilif · on Oct 6, 2012

Regarding keep-alive: yes, it will lead to memory issues when it's on and you use mod_php and the prefork MPM (which you really have to with mod_php). But this is just one configuration. You can easily fix it by either of:

- use php-fpm and switch to any other MPM in apache

- keep using mod_php but put an nginx or other reverse proxy before apache. It will do the keep-alive and you can configure apache to still close it's connections.

- use nginx and php-fpm directly

All will mitigate the memory issue while still allowing you to offer keep-alive

thaumaturgy · on Oct 6, 2012

I think prefork/mod_php is still the default setup, and it's a really good one to get away from right away if you plan on handling any serious traffic.

Using nginx/php-fpm is great, so long as you don't need support for Apache-style .htaccess files. Since I'm adminning a shared hosting environment, I can't give up support for .htaccess files; if you intend to only host your own site, and your site doesn't need that, then by all means please use nginx & php-fpm, you'll save yourself a lot of headaches.

Setting up an nginx proxy seems like kind of a cheat in a discussion about tuning Apache -- "how to tune Apache: 1. don't tune Apache, set up nginx proxies instead..." -- but suffice it to say that setting up proxying nginx servers is on my to-do list.

I have not yet though had a traffic-related site outage (see also http://news.ycombinator.com/item?id=4619906), so it's not as high on my list as things like "recurring payments system". :-)

floomp · on Oct 6, 2012

> - After installing Apache, change the following in httpd.conf: ServerToken -> Prod, ServerSignature -> Off, KeepAlive -> Off > > - After installing PHP, edit php.ini to make it shut up: expose_php -> Off

What is the purpose of this? Security by obscurity? Or being really frugal about header length?

thaumaturgy · on Oct 6, 2012

Yep, security by obscurity. AKA, the exact same reason that people move ssh to other ports.

If it turns out that there is some kind of exploit floating around for a specific version of Apache or PHP, and if jackasses are looking for vulnerable servers by first looking at the server headers instead of just randomly targeting servers, then I want mine to be totally useless to them so that they (hopefully) just move on to some other poor schmuck's server.

Security-by-obscurity is only a problem when you depend on it, or when you are using it to cover up some kind of serious stupidity, like totally untested crypto. There's nothing wrong with using it to frustrate and annoy adversaries.