random memes }

Benchmarking and FastCGI

An interesting, but not especially useful benchmark.

Quick Django Benching :: SuperJared.com Today I did an unscientific benchmark to see which was quickest between Apache's mod_python, Lighttpd's FastCGI and Nginx's FastCGI. The Django application was a basic "Hello world" application that used one variable in one template.

All of the web server configurations were stripped to their minimums. I'm not going to pretend that this was a thorough benchmark, so these results might be a bit off, but they're useful nonetheless.

Not sure that this benchmark tells us anything interesting. There are a couple dimensions we would want to explore, to get a better notion of performance characteristics. Performance, ease of development, and ease in scaling are relevant dimensions we want to explore. Note the following applies to FastCGI, SCGI, and reverse proxies.

(Apologies to the folk that have been down this path before. You will find nothing new in what follows.)

What do we care about?

First, performance has to be an issue. For small deployments where a single ordinary server is sufficient, other concerns (ease of deployment, portability, whatever) are going to dominate. As demand increases, you get interested in how much performance you can get out of a single machine. The above-referenced benchmark tells you something about the single machine scenario (not much - but something). As demand really increases you are interested in the performance and scaling across a growing group of machines.

Say you start with a single-server configuration. Let's talk about something like the classic web application scale-out.

As demand increases, you might want to think about partitioning your application. Static contents (HTML, CSS, images) are better served by the front-end HTTP service. The single-thread/async-IO HTTP services (lighttpd, thttpd) tend to beat to out Apache and other general-purpose HTTP services, while putting less load on the server(s). If you application is heavy on static content (say, a photo-sharing site), then Apache is at a heavy disadvantage.

So you partition your application into static and dynamic content. Static content is optimally served by lighttpd (or the like). Dynamic content is served over FastCGI by your application code (Ruby, Python, PHP, Java, or whatever). Performance serving static content is now optimal.

As demand increases further, you find a single server box is no longer adequate. At this point you buy another box, and move your application code to the new box. Note that users do not see any change - they are still hitting the same front-end HTTP server. FastCGI connects the old front-end HTTP server to the new back-end application server. Note that your front-end and back-end boxes can be configured differently, as needed by the different sorts of load (static file serving versus dynamic content generation). Note that your front-end box should be able to easily saturate a 100MB network link on static content (so you have a lot of headroom).

Lets say your application is heavy on serving static files, more than dynamic content. As load increases you need to buy one (or more) additional front-end servers. No problem - the second server can connect over FastCGI to existing back-end server. Your front-end servers will be configured specifically for your static content - light on CPU, fast network throughput, fast access to your static content (disk and/or cache). Also now you are safe against failure of any one front-end server.

As load increases you need to add more back-end servers. No problem - the ratio of front-end to back-end servers can be adjusted to exactly match the needs of your application. Note that every front-end server can connect to every back-end server, so adjusting for load and handling fail-over is straight-forward.

Note also that the stripped-down/locked-up configuration of front-end boxes suitable for exposure to the Internet is pretty straightforward. The back-end boxes (which may need a more permissive and complex configuration) do not need to be addressable from the Internet. Security is somewhat simpler.

There is another scenario where FastCGI (or similar) is interesting. Lets say your company has a single common internal-use web server. Odds are, it is a Windows/IIS box (company policy, legacy, clueless network administrators, or whatever reason). By hooking up via FastCGI you can host your web applications on a box with whatever configuration is suitable (Linux perhaps rather than Windows). Odds are the old/common IIS box is going to go a long ways just serving static content and forwarding FastCGI requests, before running out of gas. The folks putting up the web application could be in an entirely different compartment within the company, with domain-specific knowledge lacked by the folks managing the IIS server. Once the FastCGI link is setup out from the IIS server, no further demands need be placed on the IIS folks.

What should we measure?

First, we need to measure the relative performance of just serving static files using lighttpd (or the like) over Apache. Nothing specific about FastCGI use in this part of the benchmark. There is a range to this benchmark - small, mid, and large-size files - all served from cache (benchmarking disk throughput is a different item).

Next, we need to measure the throughput and latency servicing FastCGI requests. The back-end process needs to be on a different machine, does no processing (so we are not measuring back-end performance). Vary the number of concurrent FastCGI connections (and matching number of back-end threads/processes). Vary the back-end delay before responding (to simulate light to heavy processing). Vary the amount of data in the response from the back-end. We could vary the size of requests - but these are usually small enough we can ignore size. Note that we are not measuring back-end performance. This set of numbers is relatively reliable as the processing performed by the HTTP is fixed (no variable application specific code to complicate the measure).

Measuring back-end performance is a different issue, and only going to be relevant for your specific application.

My guess is that we find:

Separation of concerns is almost always a good idea in programming. There is every chance the same notion applied to web application architecture works out as well. Performance is more predictable when the workload is uniform. Uniform workloads make choosing an optimal hardware/software configuration easier. In at least some scenarios, administration is easier.