World domination.
--Linus Torvalds (emphasis added)
This section was originally in ../roadmap.html. Note that much of the work described has been completed, so this document now contains a mixture of future work and historical data.
- Linguistic side-note:
- In paleontology, ``cursorial'' refers to the hypothesis that dinosaurs evolved into birds by running after flying insects with their (feathered) forelimbs outstretched as catchers. This contrasts with the ``arboreal'' hypothesis that wings evolved to improve dinosaurs' ability to jump out of trees.
- Historical side-note:
- Dialog of the Two Great World Systems was the title of Galileo's (rather one-sided) comparison of the Ptolemaic and Copernican representations of the universe.
The DOM presently DOMinates the field of Java representations for XML documents (parse trees). Essentially, it's the only game in town. Unfortunately it has some problems:
Also, there is no place in the DOM for the kind of metadata (e.g., handlers) that the DPS needs in order to do its job. This by itself is enough to require the PIA to have its own implementation of the DOM, although it's not totally impossible that metadata might not end up in some future level of the DOM.
The DPS notion of a Cursor evolved out the (subsequently abandoned, now back in Level 2) DOM interface TreeIterator. The basic idea is that a Cursor allows the DPS to traverse a DOM tree, or what appears to be a DOM tree, without actually looking at the DOM nodes themselves.
Presently there is a lot of leakage between the DOM and the DPS, and the current version of Cursor actually does allow (and in fact require) access to the DOM node. This can be fixed, however. The goal is for the Cursor to allow complete access to all of a Node's attributes without allowing access to the Node itself.
The result would be a purely ``cursorial'' interface to documents, completely independent of the DOM's ``arboreal'' interfaces.
An additional, worthwhile extension to cursors would be permitting a
cursor to represent a CharData
node (i.e. text, comments,
etc.) as a string without the actual node having to exist. This might
even apply more generally to nodes with NodeList values. The result would
be significantly less node creation.
There are two distinct routes to a split model with an upgraded DOM:
The second path is basically cleaner, but the first would get us to DOM compliance more quickly, and integrates better with the other near-term goal of an XML representation for Agents. We have chosen the first route, and the DOM re-implementation has been completed.
This section was originally in ../projects.html.
Possible name: GLOM (Generic Lightweight Object Model). Other possibilities include Dlite, DOC (Document Object Cursors), ...
The problem is that once you start using the DOM, you're pretty-well stuck with it, including all its unsuitable characteristics: lack of DTD coverage, live nodelists, fragile iterators, doubly-linked trees, total inability to move nodes between documents, and so on.
Our representation needs to be compatible with but disjoint from the DOM. Our trees should have both a DOM interface (for compatibility with other tools) and our own; the DPS should use only our own interface.
Insisting that parse trees be immutable fixes many of the DOM's problems. There are still some problems with entities, but we can gloss over that: if you're redefining entities you're by definition outside of the DOM.
Bascally what we're doing is splitting up the DOM interfaces into two pieces: the informational and the navigational. We want to be able to get at the attributes, content/value/children, and text of our nodes without worrying about the connections among them.
Of course, in addition to the DOM attributes, we also want to get at the DPS attributes of our objects: action, syntax, key (for sorting and extraction), etc.
There are three possibilities:
org.risource.dom.active
,
and the implementations to org.risource.dom.tree
. Move
Cursor
and related stuff to, perhaps,
org.risource.dom.cursor
. Blow away
org.risource.dps.active
or use it for purely DPS-specific
stuff (which might be hard to find). It's rather hard to justify
having the ActiveNode interfaces under the DOM, but it does mean that
org.risource.dom
and its sub-directories constitute a
completely stand-alone implementation of the DOM.
org.risource.dps.tree
. Blow
away org.risource.dom
or use it for utilities that use
only the org.w3c.dom
interfaces. This leaves
org.risource.dps
and its sub-packages as a stand-alone
document processor. If the DOM and DPS go their separate ways, nobody
will notice.
org.risource.dom.tree
but
leave the ActiveNode interfaces in
org.risource.dps.active
. In this case, neither
org.risource.dom
nor org.risource.dps
can
stand alone, but minimal damage is done to the existing directory
structure (tree
is added, but nothing goes away). The way
is still open to a totally DOM-free version of the DPS.
We have taken the middle way.
There are basically three ways to proceed:
The nice thing about the middle path is that it is the most incremental and least disruptive. It proceeds very quickly to a compliant DOM implementation, then merges it in. It does mean several different disruptions, but at least they would be fairly short ones. It also means that things can go under source control before work is complete.
During the transition, the ActiveNode
interfaces will
continue to be used in the DPS. Afterward, they will be used only in the
cursor implementations.