mod_annot editor

Annotate Section

Configuring the Proxy

As with any modules, the first thing to do is to load them in httpd.conf (this is not necessary if we build them statically into Apache).


LoadModule  proxy_module         modules/mod_proxy.so
LoadModule  proxy_http_module    modules/mod_proxy_http.so
#LoadModule proxy_ftp_module     modules/mod_proxy_ftp.so
#LoadModule proxy_connect_module modules/mod_proxy_connect.so
LoadModule  headers_module       modules/mod_headers.so
LoadModule  deflate_module       modules/mod_deflate.so
LoadFile    /usr/lib/libxml2.so
LoadModule  proxy_html_module    modules/mod_proxy_html.so

For windows users this is slightly different: you'll need to load libxml2.dll rather than libxml2.so, and you'll probably need to load iconv.dll and xlib.dll as prerequisites to libxml2 (you can download them from zlatkovic.com, the same site that maintains windows binaries of libxml2). The LoadFile directive is the same.

Of course, you may not need all the modules. Two that are not required in our typical scenario are shown commented out above.

Having loaded the modules, we can now configure the Proxy. But before doing so, we have an important security warning:

Do Not set "ProxyRequests On". Setting ProxyRequests On turns your server into an Open Proxy. There are 'bots scanning the Web for open proxies. When they find you, they'll start using you to route around blocks and filters to access questionable or illegal material. At worst, they might be able to route email spam through your proxy. Your legitimate traffic will be swamped, and you'll find your server getting blocked by things like family filters.

Of course, you may also want to run a forward proxy with appropriate security measures, but that lies outside the scope of this article. The author runs both forward and reverse proxies on the same server (but under different Virtual Hosts).

The fundamental configuration directive to set up a reverse proxy is ProxyPass. We use it to set up proxy rules for each of the application servers:


ProxyPass       /app1/  http://internal1.example.com/
ProxyPass       /app2/  http://internal2.example.com/

Now as soon as Apache re-reads the configuration (the recommended way to do this is with "apachectl graceful"), proxy requests will work, so http://www.example.com/app1/some-path maps to http://internal1.example.com/some-path as required.

However, this is not the whole story. ProxyPass just sends traffic straight through. So when the application servers generate references to themselves (or to other internal addresses), they will be passed straight through to the outside world, where they won't work.

For example, an HTTP redirection often takes place when a user (or author) forgets a trailing slash in a URL. So the response to a request for http://www.example.com/app1/foo proxies to http://internal.example.com/foo which generates a response:


        HTTP/1.1 302 Found
        Location: http://internal.example.com/foo/
        (etc)

But from the outside world, the net effect of this is a "No such host" error. The proxy needs to re-map the Location header to its own address space and return a valid URL


        HTTP/1.1 302 Found
        Location: http://www.example.com/app1/foo/

The command to enable such rewrites in the HTTP Headers is ProxyPassReverse. The Apache documentation suggests the form:


ProxyPassReverse /app1/ http://internal1.example.com/
ProxyPassReverse /app2/ http://internal2.example.com/

However, there is a slightly more complex alternative form that I recommend as more robust:


<Location /app1/>
        ProxyPassReverse /
</Location>
<Location /app2/>
        ProxyPassReverse /
</Location>

The reason for recommending this is that a problem arises with some application servers. Suppose for example we have a redirect:


        HTTP/1.1 302 Found
        Location: /some/path/to/file.html

This is a violation of the HTTP protocol and so should never happen: HTTP only permits full URLs in Location headers. However, it is also a source of much confusion, not least because the CGI spec has a similar Location header with different semantics where relative paths are allowed. There are a lot of broken servers out there! In this instance, the first form of ProxyPassReverse will return the incorrect response


        HTTP/1.1 302 Found
        Location: /some/path/to/file.html

which, even allowing for error-correcting browsers, is outside the Proxy's address space and won't work. The second form fixes this to


        HTTP/1.1 302 Found
        Location: /app2/some/path/to/file.html

which is still broken, but will at least work in error-correcting browsers. Most browsers will deal with this.

If your backend server uses cookies, you may also need the ProxyPassReverseCookiePath and ProxyPassReverseCookieDomain directives. These are similar to ProxyPassReverse, but deal with the different form of cookie headers. These require mod_proxy from Apache 2.2 (recommended), or a patched version of 2.0.