Ask HN: How, bottom to top, does a modern web application work?

hoodoof · on July 15, 2015

Your path to enlightenment begins by learning at a detailed level what a web server does, and what a web browser says to that server.

Do these tutorials:

http://ruslanspivak.com/lsbaws-part1/

http://ruslanspivak.com/lsbaws-part2/

http://ruslanspivak.com/lsbaws-part3/

And when you are doing web development, use this command to understand exactly what is being passed between client and server (replace port number with the port number that your server is running on).

sudo ngrep -W byline -d lo port 8001

You have progressed to the next rank in your journey when you understand in detail what a request (typically from a browser) looks like and how it is structured, and what a response (from a web server) looks like and how it is structured.

dnotrael · on July 15, 2015

This is exactly the kind of resource I was looking for. Thank you. I hope he keeps making more tutorials because I am very interested in the other things on his About page. I'm still excited to see what other answers people have.

arh68 · on July 15, 2015

You can run 1 web server or 2. You need at least one listening on :80; it can either pass the request off to a module (like mod_php, mod_perl, CGI) or it can pass the request off to another web server. The 12factor site is espousing this second approach, separating the front-end server (perhaps nginx on :80) from the application server (perhaps apache on :8080/3000, running mod_whatever, or in their case, a standalone app server like Jetty/Tornado). Almost any web server can run as a reverse proxy, but mod_php/mod_perl usually require a specific server like Apache.

If you've got a dollar, and an hour, maybe run through setting it all up [1], then run lsof -i at the end to understand what's listening on what ports.

[1] https://www.digitalocean.com/community/tutorials/how-to-conf...

dnotrael · on July 15, 2015

Thanks for making the distinction between the 1 and 2 web server set-ups. I actually have a DO droplet and I was looking to move the application to Nginx + FPM just for learning purposes so that tutorial is helpful. I've also read some of the other tutorials on Nginx.

HatersGunnaHate · on July 15, 2015

The way I learned how to setup a server without using a control panel of some kind is by diving right in.

At first I messed around on a local virtual machine running the same distro I planned to use.

In a few hours I had NGINX, PHP-FPM and Rails apps up and running on a server.

javajosh · on July 15, 2015

Well it's good you're aware of processes and ports, because that's the key thing. Then I would focus more on how they relate to each other, and to the outside world.

Server processes bind to a port, and then run some code to handle inbound messages. For a web server, the inbound messages are string key/value pairs. By convention, there is special support for file access in HTTP, which is the path section of the URL. Webapps use the path just as an ordinary string.

There are really two parts to understanding the system: first, configuring all the processes and spinning them up in the right order. Second, once it's running, how messages flow through those processes. A reverse proxy is terrible jargon for a server process that listens on one port and forwards the traffic to another (usually local) port. (It's called a reverse proxy because forward proxies are used by clients connecting out to servers).

You might have 4 server processes. Nginx, Apache/PHP, Rails (RVM) and MySQL. The first three can all listen on port 80, but we pick Nginx to serve that role, and configure it to do so and forward to Apache and the RVM, which are configured for arbitrary higher ports. Meanwhile, you've probably configured Apache and RVM to be able to talk to MySQL (with 'drivers') and configured them to know about your running MySQL instance. All 4 of these processes are probably emitting logs to disk somewhere, too.

Interestingly, in the absence of a request, all of these processes just sit there. They do nothing, and use precisely 0 of your CPU. It is only when a client process presents Nginx on port 80 with a new string that they all come to life: nginx first, then PHP (say), then, if the stars align, MySQL. The confirmation bubbles back up a distributed call chain and out to the client.

Yay, CRUD.

bgurupra · on July 15, 2015

Here is a detailed crowdsourced answer to the interview question ""What happens when you type google.com into your browser's address box and press enter?""

https://github.com/alex/what-happens-when

nostrademons · on July 15, 2015

This makes a great interview question. :-)

The short answer is that it depends on how you set up your webserver. Different languages have different defaults, and of course you can override the defaults and set them up entirely differently.

A typical PHP installation runs as an Apache module. In this setup, the Apache server listens for HTTP requests and looks at their request path and virtual host. When it finds one that matches a PHP rule (defined in your Apache configuration), it starts the PHP interpreter, setting variables like $_GET and $_POST from the request data. On very old PHP installations, it then uses the path to locate the PHP script, parses it, and executes it. In newer (post-~2005) installations, the server caches the compiled PHP script in memory and executes it again on subsequent requests, without having to hit disk to read the file contents.

A Rails or Django deployment circa 2007 would use Nginx or Lighttpd as the webserver, and then communicate with a separate application server over FastCGI or SCGI. The latter are simple binary protocols that are designed solely to communicate between webserver and appserver; they basically include the information in the HTTP request, but in a parsed, compact format that's fast to decode. The application server would then decode the request and pass it to a web framework to execute, returning an HTML response that is forwarded on by the webserver.

Why split the webserver from the appserver? Because running your application's code is typically slow and memory-intensive; it's usually CPU-constrained. Serving static files and parsing HTTP, meanwhile, is typically fast, cheap, and bandwidth constrained. If you connect the app server directly to the Internet, your memory-hogging application will sit idle much of the time while pushing bytes out to a browser on dialup or a cell network. Splitting the servers lets you scale them independently; typically, you need just a few frontend load balancers to serve many app servers. It also gives you fault tolerance, since if an app server crashes, the load balancer can retry the request with a different one.

WSGI and Rack are HTTP interface specifications for Python and Ruby, respectively. Basically, all your Python/Ruby code needs to run inside a server somewhere, which talks some network protocol. A number of different web frameworks have cropped up to make programming webapps easier - things like Django, Pylons, and Flask for Python and Rails or Merb in Ruby - and these frameworks typically optimize for ease of programming. Similarly, a number of different appservers have cropped up - gunicorn and uwsgi for Python, unicorn and Mongrel and Thin for Ruby, Phusion Passenger for both - and these typically optimize for speed. A common gateway interface lets you mix and match between them. The reason why you need a different one for each language is that the app server and the webapp framework are typically hosted in the same process, communicating in-memory, and different languages have different memory representations. You don't always, however - uWSGI, for example, is written in C and can be used with Python, Perl, or Ruby.

Around 2010, people realized that HTTP was a perfectly valid transport protocol, and started to use it in place of FastCGI and SCGI. Now virtually all deployments use nginx talking http to one or more appservers that run the actual webapp code.

So to a first approximation, when your HTTP request hits a server, it makes a TCP connection to port 80 on an nginx instance somewhere. nginx parses the request (which looks something like "GET /myapp/index.php HTTP/1.1\n\n...headers..." - HTTP is just a text protocol), looks in its configuration file, and matches /myapp/*.php against the rule for some app (pretend it's actually a Python/Django app running on the same physical server for illustration). It then makes an HTTP request to localhost:3000 on the server to talk to the app server. On port 3000, you have uWSGI running, which again parses the request, populates a Python dictionary, and invokes the callable given in the uwsgi config file. That callable will typically be Django's entry point, where Django consults its root urlconf and routes the request to your application code.

dnotrael · on July 15, 2015

You are my hero. This is exactly the type of answer I was looking for.

miles932 · on July 15, 2015

Not any more! ;)

fizwhiz · on July 15, 2015

Care to elaborate?

miles932 · on July 15, 2015

Well, by typing out such a well thought out and complete answer, I'd say it's value as an interview question has been somewhat diminished. Otherwise, a great writeup!

nostrademons · on July 15, 2015

You can still ask "What's going on at the byte level on the wire?" or "What's going on in the GPU when it renders the page?" or "What's going on inside Django when it computes what code to invoke?" or "What's going on inside the Python interpreter when it executes it?" The beauty of the question is that it's almost infinitely deep, and there's nowhere near enough time to answer it all in an interview, and the depth to which the candidate can go tells you a lot about where their specialty lies.

rockerBOO · on July 15, 2015

Apache is a web server that add parsers as modules (php). NGINX is a reverse proxy/web server that can proxy to other processes listening on other ports. (PHP-FastCGI running on port 3000).

BROWSER -> NGINX (80) -> PHP-FastCGI (3000) -> NGINX -> BROWSER

NGINX can also proxy on sockets

BROWSER -> NGINX (80) -> PHP-FastCGI (/usr/socks/php.socks) -> NGINX -> BROWSER

CGI just defines the interface, and the language would be implementing the interface.

dnotrael · on July 15, 2015

Very succinct answer! Thank you. I am still a bit confused as to what interface CGI exposes? Is it a way for a programming language to take a HTTP request? As in, PHP-FastCGI is an implementation in PHP of receiving the HTTP request as data, and then calling the appropriate handler in the application?