Notes on PIA Interfaces and Integration

DRAFT: This is a work in progress! Read at your own risk; believe at your peril!

Note that this file is not actually checked in yet; I'm still thinking about the best place to put it within the PIA documentation tree.

This document discusses the PIA's architecture, the interfaces (API's) that it supports, and ways of integrating the PIA with other web servers and into other web applications in such a way to preserve cross-platform application portability.

Introduction

The PIA architecture has three main components (also referred to below as ``features''):

  1. dps -- the Document Processing System that implements the tag language. By itself, the DPS essentially provides a simple but complete ``macro language'' for XML and HTML web pages.
  2. site -- the site structure package. This provides a simple, robust, versatile, and OS-independent way of structuring the files and directories that comprise a web application.
  3. agents -- PIA agents provide the ability to operate (using the DPS) on web requests and documents as they pass through a PIA-based proxy server. It is worth noting that most PIA applications will not require agents to operate on proxied transactions.

There are also four different degrees to which the PIA or its components may be integrated with a web server such as Apache:

  1. Not at all -- the PIA functions as a stand-alone web server.
  2. Attached Server -- the main web server passes (some subset of) its requests off to the PIA. This is the standard interface for servlet engines and for server-side scripting engines that need to share state among requests. Note that many, if not all, attached servers can also run stand-alone.
  3. Fully Integrated -- the PIA components run inside the main server's memory image. In a multi-process, single-threaded server like Apache 1.x this makes it difficult (not necessarily impossible) to share state among requests.

Interfaces

The following table lists some interfaces which it is plausible that the PIA might support:

Interface Difficulty Features Priority Notes
Protocol Interfaces  
 AJP mid +site +agents mid This is the protocol used between Apache and the Java servlet engines; most other high-end web servers also support it. It may also be used by the PHP engine. Code can be obtained from Tomcat or JServ.
 
Java Interfaces For interoperation with existing XML toolsets.
 Servlet low ?site ~agents high Mark did this; it should be easy to resurrect. The Servlet interface supports non-proxied agents.
 DOM low -site high Already written, but tuning and testing are required.
 SAX mid -site high The hooks are in for this, but they're totally untested.
 TRAX ? -site low This is a proposed interface; we'll have to wait until it stabilizes. No information available yet.
 Cocoon 1.x low -site mid This is essentially a one-liner given DOM.
 Cocoon 2.x mid -site low Probably simple given SAX and a stable Cocoon 2.0, but it may take a while to get to that point.
 Tomcat
Servlet
mid +site +agents low Tomcat includes hooks for agents.
 
C Interfaces Difficulty ratings assume existance of CPIA.
 mod_dps low -site high Should be very simple given C port of DPS.
 mod_site high -dps +site high Does not require DPS; can be done using the XML parser built into Apache 2.0. It should be possible to leverage off of the metadata kept by mod_dav.
 mod_pia mid +site +agents mid This is really just adds agents to mod_dps and mod_site

The features +dps and -agents are assumed unless otherwise noted. The ``full-featured'' PIA is +dps +site +agents.  
Difficulty ratings
low < 1 week
mid < 1 month
high > 1 month
 
Priority ratings
low postpone
mid do if convenient.
high Do it!

PIA and Apache

There are two main paths to a ``full-featured'' PIA integrated with Apache:

  1. mod_pia -- a full-featured PIA implemented as a set of standard, C, Apache modules. It should not be necessary to implement a stand-alone PIA server in C.
  2. AJP -- this is the interface used between a web server and a subsidiary server-like engine. It keeps the present PIA essentially intact, and would be more efficient and better-integrated than the present mod_proxy-based scheme. Some investigation is required, mainly to see whether proxying (and hence agents) can be supported through AJP.

Several intermediate or alternative structures are possible:


DPS outside the PIA

It is likely that the DPS will be useful outside of the PIA. Since the existing code will soon support all of the major Java interfaces (DOM, SAX, Servlet) used in XML applications this use should be encouraged, and may eventually lead to the PIA's tag language becoming a standard, either as an alternative or (less likely) an extension to XSLT.


Alternative Implementations

No effort should be put into alternative implementations (except for C, of course) at this time, but if some outside developer wants to do one they should be encouraged.

C
The CPIA effort is, of course, already in progress.
C++
A C++ implementation would be a straightforward translation of the Java implementation, made simpler by the fact that C++ versions of the SAX and DOM API's already exist.
PERL
This is the most attractive alternative implementation, because mod_perl is very popular and because most of the machinery required is already present in the form of PERL modules.
Python
A Python implementation would probably be roughly as easy as a PERL one; Python has very good XML support, including a DOM implementation.
XSP (Cocoon 2)
An implementation of the DPS as an XSP ``taglib'' (tag library) is feasible; it would provide a sort of ``universal taglib'' for Cocoon.
DPS Compiler
A set of DPS tags could (fairly easily) be written that translate the DPS tags into subroutines or classes in some existing programming language. This raises the intriguing possibility of a PIA implementation written entirely ``in itself.'' Such an implementation would be both extremely portable and highly efficient.

Roadmap

What follows is the implementation roadmap for the PIA's interfaces:

  1. Implement the Java interfaces (Servlet, DOM, SAX, in that order). Even though these do not directly lead to a full-featured PIA integrated with Apache, they are very simple and we have already committed to doing them. They will give the PIA the highest possible level of interoperability with existing XML applications with minimal effort. In addition, the Servlet interface will provide a fallback position, integrating the PIA's dps and site packages with any servlet-based web server.
  2. Implement, or at least investigate, AJP. This would provide a PIA implementation fully integrated with Apache and with all the other web servers that support AJP. It's also possible that code taken from Tomcat's front end could greatly simplify and clean up the PIA's front-end code in org.risource.pia and org.risource.content.
  3. Implement mod_pia; the C port should be ready at about this time.
  4. Cleanup. Based on experiences with mod_pia and the Java interfaces, it should be possible to do a major cleanup and simplification of the PIA's Java implementation. This would include eliminating redundant classes and interfaces, lazy expansion of tag content, and major overhaul of the org.risource.pia package.

Copyright © 2000 Ricoh Innovations, Inc.
$Id: interfaces.html,v 1.2 2001-01-11 23:36:54 steve Exp $
Stephen R. Savitzky <steve@rii.ricoh.com>