This document contains notes about Handlers in the PIA's Document Processing
System (DPS), as implemented in the Java module org.risource.dps.handle
.
../
The org.risource.dps.Handler
interface is
actually a composite of two other interfaces:
org.risource.dps.Syntax
and
org.risource.dps.Action
.
Every ActiveNode
actually
points to two (potentially different) handlers, accessible through the
getSyntax()
and getAction()
methods.
The Syntax
interface of a Handler is invoked at parse
time. In the typical case, for an Element, it tells the parser
whether the element is empty, whether its contents are parsed or unparsed,
and so on.
When the Parser constructs a new Node, for example an Element, it passes
the tagname and attribute list to the Tagset using the
createActiveElement
method. The tagset obtains the
appropriate handler using its own getHandlerForTag
method,
and calls on the handler to construct the node, using its
createElement
method.
handler.createElement
normally constructs a default
ParseTreeElement
object, but may be overridden to construct a
subclass. See tagsetHandler
for an example of this.
The syntax Handler then sets the new element's action handler, using
e.setAction(getActionForNode(e));
. The default is simply to
return this
, but the handler has a chance to check for the
presence (though not the value) of attributes at parse time and get some
dispatching out of the way. See testHandler
for a good example of
this technique.
At this point, the syntax interface is out of the picture.
An ActiveNode's associated Action
handler is called from a
Processor (or from processing utilities in org.risource.dps.aux.Expand
, although
in practice these almost invariably construct a sub-processor).
The relevant code in BasicProcessor is:
public boolean run() { running = true; processNode(); while (running && input.toNext()) processNode(); return running; } /** Process the current Node */ public final void processNode() { Action handler = input.getAction(); if (handler != null) { doAction(handler.getActionCode(), handler); // MUST BE equivalent to: handler.action(input, this, output); } else { expandCurrentNode(); } } /** Perform any additional action requested by the action routine. */ protected final void doAction(int flag, Action handler) { switch (flag) { case Action.ACTIVE_NODE: action(input, this, output); return; case Action.COPY_NODE: copyCurrentNode(); return; case Action.EXPAND_NODE: expandCurrentNode(); return; case Action.EXPAND_ATTS: expandCurrentAttrs(); return; case Action.PUT_NODE: putCurrentNode(); return; } }
Eventually we get down to calling the ``three-argument'' action method,
which in GenericHandler
(which is the parent of the handlers
for all active elements) looks like this:
public void action(Input in, Context aContext, Output out) { defaultAction(in, aContext, out); }
All this is doing is passing the real operation off to
defaultAction
, in case you want to
There are four different kinds (classes?) of handler classes:
Handler
: for example,
EntityHandler
, which handles entity references.
Handler
as a suffix to keep them
from being confused with Java keywords. For example,
ifHandler
, which handles the <if>
element. In general these are public classes.
tagname_attribute
. For example,
numeric_sort
, which handles the
<numeric sort>
element. These are almost invariably
package-local classes, defined in the same file as their parent
element handler.
fromHandler
, which handles the
<from>
sub-element of <select>
.
Note that several tags can share a handler class by specifying the classname
explicitly. For example, <else-if>
and
<elsif>
share the same handler. It is also possible to
construct variant tagsets in which every element has a different name than
the ``standard'' one. Because of this, when a parent handler wants to
identify specific sub-elements, it will usually compare the class names of
their handlers instead of their tagnames. See ifHandler
for
a good example.
When writing a new handler from scratch, say for the
``<foo>
'' element, the best way to start is with the
command:
make class tag=foo
This copies a skeleton called TypicalHandler
, replacing all
occurrances of ``typical
'' with ``foo
'' and so
giving you a good place to start.
The new class will have a getActionForNode
method to dispatch
on attributes at parse time, and a sample attribute-handler subclass.
Either edit the names, or delete them if you don't want them.
The skeleton gives you a ``five-argument'' action
method to
customize. This is almost always the right place to start; you can do
anything with it, but it may be less efficient than a customized
``three-argument'' action. In particular, if you need the contents of the
element as a string it is significantly more efficient to make a
three-argument action routine; see testHandler
and its
subclasses for some typical examples.
If you need to do something involving control structure, take a look at
repeatHandler
and ifHandler
. If you need to
pass data between an element and its sub-elements, or from one sub-element
to another, look at selectHandler
.
When writing a handler for an attribute or sub-element, the best thing to do is to clone an existing one with the same parent.
If you need to add sub-elements to a new parent element, take a
look at selectHandler
.
Input
(usually passed as
an argument called in
). The input is also the right place
to go for conversions, e.g.
ActiveElement e = in.getActive().asElement();
Context
(usually called either cxt
or
aContext
).
Tagset
, use
cxt.getTopContext().getTagset()
.
You can always get debugging information output using the
debug
or message
methods on
Context
. Note, however, that any computations involved in
computing the message will be executed whether debugging is turned on or
not. It is usual, therefore, to comment out debugging statements after
you're done with them.