The Filter
Now the main filter itself is broadly straightforward, but there
are a number of interesting and unexpected points to consider.
Since this is a little longer than the above utility functions, we'll
comment it inline instead. Note that the Header and Footer file
buckets are set in a filter_init function (omitted for brevity).
static int txt_filter(ap_filter_t* f, apr_bucket_brigade* bb) {
apr_bucket* b ;
txt_ctxt* ctxt = (txt_ctxt*)f->ctx ;
if ( ctxt == NULL ) {
txt_filter_init(f) ;
ctxt = f->ctx ;
}
Main Loop: This construct is typical for iterating over the incoming data
for ( b = APR_BRIGADE_FIRST(bb);
b != APR_BRIGADE_SENTINEL(bb);
b = APR_BUCKET_NEXT(b) ) {
const char* buf ;
size_t bytes ;
As in any filter, we need to check for EOS. When we encounter it,
we insert the footer in front of it. We shouldn't get more than
one EOS, but just in case we do we'll note having inserted the
footer. That means we're being error-tolerant.
if ( APR_BUCKET_IS_EOS(b) ) {
/* end of input file - insert footer if any */
if ( ctxt->foot && ! (ctxt->state & TXT_FOOT ) ) {
ctxt->state |= TXT_FOOT ;
APR_BUCKET_INSERT_BEFORE(b, ctxt->foot);
}
The main case is a bucket containing data, We can get it as a simple
buffer with its size in bytes:
} else if ( apr_bucket_read(b, &buf, &bytes, APR_BLOCK_READ)
== APR_SUCCESS ) {
/* We have a bucket full of text. Just escape it where necessary */
size_t count = 0 ;
const char* p = buf ;
Now we can search for characters that need replacing, and replace them
while ( count < bytes ) {
size_t sz = strcspn(p, "<>&\"") ;
count += sz ;
Here comes the tricky bit: replacing a single character inline.
if ( count < bytes ) {
apr_bucket_split(b, sz) ;
Split off before buffer
b = APR_BUCKET_NEXT(b) ;
Skip over before buffer
APR_BUCKET_INSERT_BEFORE(b, txt_esc(p[sz],
f->r->connection->bucket_alloc)) ;
Insert the replacement
apr_bucket_split(b, 1) ;
Split off the char to remove
APR_BUCKET_REMOVE(b) ;
... and remove it
b = APR_BUCKET_NEXT(b) ;
Move cursor on to what-remains
so that it stays in sequence with
our main loop
count += 1 ;
p += sz + 1 ;
}
}
}
}
Now we insert the Header if it hasn't already been inserted.
Note:
(a) This has to come after the main loop, to avoid the header itself
getting into the parse.
(b) It works because we can insert a bucket anywhere in the brigade,
and in this case put it at the head.
(c) As with the footer, we save state to avoid inserting it more than once.
if ( ctxt->head && ! (ctxt->state & TXT_HEAD ) ) {
ctxt->state |= TXT_HEAD ;
APR_BRIGADE_INSERT_HEAD(bb, ctxt->head);
}
Now we've finished manipulating data, we just pass it down the filter chain.
return ap_pass_brigade(f->next, bb) ;
}
Note that we created a new bucket every time we replaced a character.
Couldn't we have prepared four buckets in advance - one for each of the
characters to be replaced - and then re-used them whenever the character
occurred?
The problem here is that each bucket is linked to its neighbours.
So if we re-use the same bucket, we lose the links, so that the
brigade now jumps over any data between the two instances of it.
Hence we do need a new bucket every time. That means this technique
becomes inefficient when a high proportion of input data has to be
changed. We will show alternative techniques for such cases in
other articles.