mod_annot editor

Annotate Section

The Filter

Now the main filter itself is broadly straightforward, but there are a number of interesting and unexpected points to consider. Since this is a little longer than the above utility functions, we'll comment it inline instead. Note that the Header and Footer file buckets are set in a filter_init function (omitted for brevity).


static int txt_filter(ap_filter_t* f, apr_bucket_brigade* bb) {
  apr_bucket* b ;
  txt_ctxt* ctxt = (txt_ctxt*)f->ctx ;

  if ( ctxt == NULL ) {
    txt_filter_init(f) ;
    ctxt = f->ctx ;
  }


Main Loop: This construct is typical for iterating over the incoming data

  for ( b = APR_BRIGADE_FIRST(bb);
	b != APR_BRIGADE_SENTINEL(bb);
	b = APR_BUCKET_NEXT(b) ) {

    const char* buf ;
    size_t bytes ;

As in any filter, we need to check for EOS.  When we encounter it,
we insert the footer in front of it.  We shouldn't get more than
one EOS, but just in case we do we'll note having inserted the
footer.  That means we're being error-tolerant.

    if ( APR_BUCKET_IS_EOS(b) ) {
      /* end of input file - insert footer if any */
      if ( ctxt->foot && ! (ctxt->state & TXT_FOOT ) ) {
	ctxt->state |= TXT_FOOT ;
	APR_BUCKET_INSERT_BEFORE(b, ctxt->foot);
      }

The main case is a bucket containing data,  We can get it as a simple
buffer with its size in bytes:

    } else if ( apr_bucket_read(b, &buf, &bytes, APR_BLOCK_READ)
	== APR_SUCCESS ) {
      /* We have a bucket full of text.  Just escape it where necessary */
      size_t count = 0 ;
      const char* p = buf ;

Now we can search for characters that need replacing, and replace them

      while ( count < bytes ) {
	size_t sz = strcspn(p, "<>&\"") ;
	count += sz ;

Here comes the tricky bit: replacing a single character inline.

	if ( count < bytes ) {
	  apr_bucket_split(b, sz) ;	Split off before buffer
	  b = APR_BUCKET_NEXT(b) ;	Skip over before buffer
	  APR_BUCKET_INSERT_BEFORE(b, txt_esc(p[sz],
		f->r->connection->bucket_alloc)) ;
					Insert the replacement
	  apr_bucket_split(b, 1) ;	Split off the char to remove
	  APR_BUCKET_REMOVE(b) ;	 ... and remove it
	  b = APR_BUCKET_NEXT(b) ;	Move cursor on to what-remains
					so that it stays in sequence with
					our main loop
	  count += 1 ;
	  p += sz + 1 ;
	}
      }
    }
  }

Now we insert the Header if it hasn't already been inserted.
Note:
  (a)	This has to come after the main loop, to avoid the header itself
	getting into the parse.
  (b)	It works because we can insert a bucket anywhere in the brigade,
	and in this case put it at the head.
  (c)	As with the footer, we save state to avoid inserting it more than once.

  if ( ctxt->head && ! (ctxt->state & TXT_HEAD ) ) {
    ctxt->state |= TXT_HEAD ;
    APR_BRIGADE_INSERT_HEAD(bb, ctxt->head);
  }


Now we've finished manipulating data, we just pass it down the filter chain.

  return ap_pass_brigade(f->next, bb) ;
}

Note that we created a new bucket every time we replaced a character. Couldn't we have prepared four buckets in advance - one for each of the characters to be replaced - and then re-used them whenever the character occurred?

The problem here is that each bucket is linked to its neighbours. So if we re-use the same bucket, we lose the links, so that the brigade now jumps over any data between the two instances of it. Hence we do need a new bucket every time. That means this technique becomes inefficient when a high proportion of input data has to be changed. We will show alternative techniques for such cases in other articles.