Reverse Proxies and the "Location" or "Content-Location" headers.

2004-12-30

Turns out the proper name for what I am attempting to do with isapi_reward is a "reverse proxy". I want users to be able to access a number of web applications through a single machine, with the web applications possibly deployed on different machines.

Specifically I am interested in one particular use-case. I want to write web applications in Java, and be able to deploy those applications on a company intranet. I want the web applications able to run stand-alone (as sometimes that is all you need). I want those web applications work well on a company intranet, and most of the time that means support for NTLM authentication. I would also like to allow the web application to reside on it's own machine, and be accessible to users through a common website with a stable (permanently bookmarkable) address.

By using IIS as a front-end we get first-class support for NTLM authentication. We also get efficient native-code SSL (https) support -- Java code doing encryption is just not a good idea if you want to scale.

Note this spins the design considerations in a number of ways. I am not interested in caching by the proxy. I am interested in only the highest possible server throughput. I am interested in only the highest possible stability under load and over time. I am not interested in extraneous features that compromise throughput or stability.

This means supporting only HTTP/1.1 between the proxy and backend servers. This means keeping the connection(s) between the proxy and backend servers open for long periods of time, and multiplexing requests from different users over the same common backend connections (looks like this makes HTTP 1.1 pipelining dubious).

Got the re-write to use I/O Completion Ports working. I still find it spooky when large chunks of newly-written code work the first time :). The connection-maker thread makes the connection, and passes the connection to the background I/O thread. There can be any number of connection-maker or I/O threads, but I suspect one of each is optimal or near-optimal for nearly all websites. Stepped through most of the code with the debugger (one of those "best practices" I find worthwhile). The request/response cycle between the client and the backend server through the proxy works, and headers are rewritten as I understand they should be.

That last is the current problem. Clearly I need to do something so the (relative) links in the response are interpreted correctly by the web browser. I think this means using the Content-Location and Location headers, but I still am unclear as to the exact semantics.

Also I am not interested in attempting to re-write HTML links inside the response (this just seems like a generally Bad Idea).

Reading the RFC is not enough to make this bit clear. Guess it is back to Google and a bit of experimenting.