How it works
mod_proxy_html is based on a SAX parser: specifically the HTMLparser
module from libxml2 running in SAX mode (any other parse mode would of
course be very much slower, especially for larger documents). It has
full knowledge of all URI attributes that can occur in HTML 4 and
XHTML 1. Whenever a URL is encountered, it is matched against
applicable ProxyHTMLURLMap directives. If it starts with any
from-pattern, that will be rewritten to the to-pattern. Rules are
applied in the reverse order to their appearance in httpd.conf, and
matching stops as soon as a match is found.
Here's how we set up a reverse proxy for HTML. Firstly, full links to
the internal servers should be rewritten regardless of where they
arise, so we have:
ProxyHTMLURLMap http://internal1.example.com /app1
ProxyHTMLURLMap http://internal2.example.com /app2
Note that in this instance we omitted the "trailing" slash. Since the
matching logic is starts-with, we use the minimal matching pattern. We
have now globally fixed case 3 above.
Case 2 above requires a little more care. Because the link doesn't
include the hostname, the rewrite rule must be context-sensitive. As
with ProxyPassReverse above, we deal with that using <Location>
<Location /app1/>
ProxyHTMLURLMap / /app1/
</Location>
<Location /app2/>
ProxyHTMLURLMap / /app2/
</Location>