Configuring the Proxy
As with any modules, the first thing to do is to load them in
httpd.conf (this is not necessary if we build them statically into
Apache).
LoadModule proxy_module modules/mod_proxy.so
LoadModule proxy_http_module modules/mod_proxy_http.so
#LoadModule proxy_ftp_module modules/mod_proxy_ftp.so
#LoadModule proxy_connect_module modules/mod_proxy_connect.so
LoadModule headers_module modules/mod_headers.so
LoadModule deflate_module modules/mod_deflate.so
LoadFile /usr/lib/libxml2.so
LoadModule proxy_html_module modules/mod_proxy_html.so
For windows users this is slightly different: you'll need to load
libxml2.dll rather than libxml2.so, and you'll probably need to
load iconv.dll and xlib.dll as prerequisites to libxml2 (you
can download them from zlatkovic.com, the same site that
maintains windows binaries of libxml2). The LoadFile directive is the same.
Of course, you may not need all the modules. Two that are not required
in our typical scenario are shown commented out above.
Having loaded the modules, we can now configure the Proxy. But before
doing so, we have an important security warning:
Do Not set "ProxyRequests On". Setting ProxyRequests On turns your
server into an Open Proxy. There are 'bots scanning the Web for open
proxies. When they find you, they'll start using you to route around
blocks and filters to access questionable or illegal material. At
worst, they might be able to route email spam through your proxy. Your
legitimate traffic will be swamped, and you'll find your server
getting blocked by things like family filters.
Of course, you may also want to run a forward proxy with
appropriate security measures, but that lies outside the scope of this
article. The author runs both forward and reverse proxies on the same
server (but under different Virtual Hosts).
The fundamental configuration directive to set up a reverse proxy is
ProxyPass. We use it to set up proxy rules for each of the application
servers:
ProxyPass /app1/ http://internal1.example.com/
ProxyPass /app2/ http://internal2.example.com/
Now as soon as Apache re-reads the configuration (the recommended way
to do this is with "apachectl graceful"), proxy requests will work, so
http://www.example.com/app1/some-path maps to
http://internal1.example.com/some-path as required.
However, this is not the whole story. ProxyPass just sends traffic
straight through. So when the application servers generate references
to themselves (or to other internal addresses), they will be passed
straight through to the outside world, where they won't work.
For example, an HTTP redirection often takes place when a user (or
author) forgets a trailing slash in a URL. So the response to a
request for http://www.example.com/app1/foo proxies to
http://internal.example.com/foo which generates a response:
HTTP/1.1 302 Found
Location: http://internal.example.com/foo/
(etc)
But from the outside world, the net effect of this is a "No such host"
error. The proxy needs to re-map the Location header to its own
address space and return a valid URL
HTTP/1.1 302 Found
Location: http://www.example.com/app1/foo/
The command to enable such rewrites in the HTTP Headers is
ProxyPassReverse. The Apache documentation suggests the form:
ProxyPassReverse /app1/ http://internal1.example.com/
ProxyPassReverse /app2/ http://internal2.example.com/
However, there is a slightly more complex alternative form that I
recommend as more robust:
<Location /app1/>
ProxyPassReverse /
</Location>
<Location /app2/>
ProxyPassReverse /
</Location>
The reason for recommending this is that a problem arises with some
application servers. Suppose for example we have a redirect:
HTTP/1.1 302 Found
Location: /some/path/to/file.html
This is a violation of the HTTP protocol and so should never happen:
HTTP only permits full URLs in Location headers. However, it is also a
source of much confusion, not least because the CGI spec has a similar
Location header with different semantics where relative paths are
allowed. There are a lot of broken servers out there! In this
instance, the first form of ProxyPassReverse will return the incorrect
response
HTTP/1.1 302 Found
Location: /some/path/to/file.html
which, even allowing for error-correcting browsers, is outside the
Proxy's address space and won't work. The second form fixes this to
HTTP/1.1 302 Found
Location: /app2/some/path/to/file.html
which is still broken, but will at least work in error-correcting
browsers. Most browsers will deal with this.
If your backend server uses cookies, you may also need the
ProxyPassReverseCookiePath and ProxyPassReverseCookieDomain
directives. These are similar to ProxyPassReverse, but deal with the
different form of cookie headers. These require mod_proxy from
Apache 2.2 (recommended), or a patched version of 2.0.