See also the Notes on Porting the PIA.
This document describes a design for a C or C++ port of the PIA. The design goals are:
There are two plausible approaches:
In view of Jade's size and complexity, it seems best to implement the DPS in C starting with Expat. The design sketch follows:
What I have in mind is an implementation that stores trees, not as individual nodes connected by pointers, but as flattened arrays of characters. Each node would have a binary header and trailer that includes the length of its contents. The only place pointers would be necessary would be in namespaces.
The advantages of a flattened tree are that, although navigation is nearly as fast as a linked tree, depth-first traversal is actually faster, and storage-management is trivial. In addition, by padding strings to some reasonable length (8 bytes comes to mind), one could ensure that numeric fields always start on word boundaries; this would also allow the occasional pointer for things like handlers.
Note that storage management for a flat, compact array of what amount to strings is particularly simple, although keeping the array compact does involve copying. In the most common case, where input is from events and output is to a stream, holes will only occur when a variable is set inside of something being expanded.
Some kind of interleaving scheme (with "skip" spans as well as "content" spans) would allow variables to be written into at the same time as intermediate (processed content) parse trees. (Variables would end up as something akin to comment nodes.) This is only an issue, of course, when something like a repeat or extract is embedded in something else, e.g. a set, that constructs a nodelist.
It may be best, though, to do the initial implementation with ordinary nodes to avoid unnecessary complexity.
Note also that in embedded applications it will often be possible to pre-parse files, and even embed them in the server in the form of initialized C data structures. Constructing the C can be done using a tagset. In particular, we can reasonably expect that tagsets will be preloaded in this way.
Apache includes a ``transaction-oriented'' storage-allocation model: all storage associated with a transaction is kept together, and returned when the transaction is completed. This makes things very simple.
One problem with Apache is that documents are served out of multiple,
single-threaded processes. This makes it difficult to share data at the
``Agent'' or ``PIA'' (global) level. If this is necessary we could make
yet another process, running as a kind of ``nano-PIA'' server, to manage
the shared entities. (This is similar to what mod_java
does,
for example.) Alternatively we could use the file system for shared data,
but there are locking issues involved in that approach.
Note that the ``flattened tree'' binary format is almost ideal for communicating with the ``nano-PIA'' entity server.
expat
into the tree (as
src/c/com/jclark/expat
) and build it.
expat
as a style guide.
process
, and another that works as a CGI. Implement a
simple mod_pia
if it's easy.
if
, hide
, and protect
.
mod_pia
in Apache.