Filtering against information disclosure

Privileges for anonymous
Read	Yes
Annotate	Yes
Edit	No
Manage	No

Sometimes a poorly-designed system may expose information of value to an attacker, in pages served in the course of normal operation. When such a leak cannot be properly fixed at its source, we can instead cut out the sensitive information in a filter or proxy. This article discusses efficient information filtering using high-performance general-purpose filters.

Security by Obscurity

Hiding information about your system doesn't make it secure. An Apache FAQ is how to turn servertokens off, commonly in the name of some illusion of increased security. Though advocated by some, this is widely held to be futile:

Generic bot-driven attacks ("script kiddies") will just try every server they can find. If you look through your apache logs, you'll be sure to find a bunch of IIS classics like Nimda and Code Red. A practical experiment is to run two servers, one announcing itself as Apache and the other as IIS, and see if there's any difference!
Expert attacks will easily identify your server as Apache, regardless of your servertokens.

Neither type of attack is affected very much by your server identifying itself as Apache or otherwise.

Insecurity by Leak

But whilst security by obscurity is an exercise in self-delusion, the reverse is not the case. Unintended information discosure can be of real value to an attacker, and can make your system unnecessarily vulnerable. Ryan Barnett in his book Preventing Web Attacks with Apache describes an exercise in gaining unauthorised access to a "buggy bank" system. The very first step in his attack is to find information about the system he is attacking, using a comment in the "HTML" generated:

</BODY><img

(ouch! That line alone should tell the knowledgeable user that buggybank is utterly incompetent, never mind the comment!).

Having seen that line, he downloads the script and finds exploitable bugs in it. When he tries the exploits, some of them are still there.

A Proposed Solution

When I read the book, I found this story an interesting insight. However, the flip side is that Barnett's remedies are almost as scary as the original problem, albeit for rather different reasons. He advocates running 'sed' with mod_ext_filter to strip information that shouldn't be revealed:


ExtFilterDefine strip_comments remove_comments mode=output \
	intype=text/html cmd="/bin/sed s/\<\!--.*--\>//g"
<LocationMatch /cgi-bin/wm*>
	SetOutputFilter remove_comments
</LocationMatch>

That works as an immediate firefighting exercise, but beyond that it's a terrible solution to the problem:

It's a huge performance hit. By the end of the chapter, he's applied three such filters. If you apply that to static documents, it could eat up more than 99% of the total memory and CPU used by Apache per request. That'll be further amplified by its adverse effect on caching.
He's overlooked regexp greediness. That will cause his regexp to eat up more than is intended.
Using line-oriented patterns means that information split over multiple lines will be missed, so the solution doesn't usefully generalise.

Better Solutions

Fortunately there are better solutions available:

mod_line_edit, mod_substitute or mod_sed can be used as an exact equivalent to mod_ext_filter+sed, but at a tiny fraction of the performance overhead. It can also be configured to generalise better.
For HTML pages, markup-aware filters can do the job more intelligently, again at a much-reduced overhead compared to mod_ext_filter.

In fairness to Barnett, things have changed since he wrote his book. mod_line_edit was published in December 2005, just three months before Barnett's book, while the other modules came later. Several markup-aware modules that'll do the job are older (going back to 2003), but none of them was intended nor advertised as a security aid.

So, let's look at how we can improve on Barnett's solution to information disclosure. First, we just replace Barnett's solution with mod_line_edit:


SetEnv LineEdit "text/html"
LERewriteRule <!--.*-->	""
<LocationMatch /cgi-bin/wm*>
	SetOutputFilter line-editor
</LocationMatch>

Right. That helps with performance. To implement Barnett's other fixes, we introduce two more LERewriteRules, which mod_line_edit applies in a single, efficient parse. We've eliminated three external program calls (in effect, three times the "CGI Overhead"), and reduced the number of times we parse the document from three to one.

Fixing the regexp is easy: for example,  works. That leaves us with multi-line comments to deal with. mod_line_edit can do that too: with
LELineEnd NONE
it will slurp the whole document into memory before parsing (a significant performance overhead for large responses, but vastly better than an ext_filter). Or, with
LELineEnd CUSTOM >
it will treat > as "line end", so that comments will be parsed whole (provided no literal > appears within a comment).

Markup-aware Solutions

Since the output is HTML, we can do better again, using a markup-aware filter rather than brute-force regexps. For example, mod_proxy_html's ProxyHTMLStripComments directive does the job. However, Barnett's other fixes involve changing text, so for them we'd need to resort to mod_publisher, with MLCommentMode set to strip comments and MLRewriteRules to replace text. The advantage of markup-aware parsing over mod_line_edit is that it doesn't fail on unexpected character sequences, such as  that are not comment delimiters.

To conclude, let's compare the three solutions in terms of their performance overhead (external programs and parses of the payload involved) and parse correctness. For most purposes, performance will be what really matters.

	External Programs	Parses	Correctness
mod_ext_filter	3	3	Limited
mod_line_edit	0	1	Limited
mod_publisher	0	1	Full

Summary

This summarises the essence of what's good and bad about Barnett's book: interesting new insights for those who have all the text-book knowledge to run a server, but solutions that need to be treated with caution and may be dangerous to the novice reader.

At the same time, it presents an example of the potential for using the author's general-purpose filter modules in security applications.