RiSource.org / PIA / Roadmap

What's next?

This page describes some planned major enhancements and future projects segmented into three categories:

  1. applications built on the PIA,
  2. improvements of the PIA software itself, and
  3. integration with other software.

If you are interested in contributing to these projects or have additional suggestions, please send mail to pia-dev@risource.org.

Example information applications built on the PIA

Many PIA applications begin with a working example and then evolve over time through customization and feature (page) additions. We plan for RiSource.org to eventually provide a working example for each main type of information application. The initial PIA distribution includes a number of example applications (agents) that provide assistance for surfing the web (using the PIA as a proxy) along with a few simple applications that demonstrate some features of the PIA. Several other applications, including; a workflow system, a calendar/discussion system, and an agent for downloading/reformatting web pages exist and should be available as soon as they are updated for release as open source.

Some other application areas that are particularly appropriate for the PIA architecture include Intranet sites (departmental Web sites, policy/benefit handbooks, employee directories, and other information generally considered part of the organizational memory), personal Web servers/proxies (your "answering machine" for the Web, family photo albums, private interfaces to other Web services), Community/group servers (keep track of team schedules, results, and rosters, member surveys, group collaboration for site development).

RiSource.org facilitates the development of these applications by hosting mailing lists and Web sites for open source information applications. See the applications page for a list of existing information applications and more information on how to contribute.

PIA software improvements

The PIA software consists of four main Java packages: dps, pia, site, and content (full names org.risource.dps, org.risource.pia, etc.). dps (document processing system) provides all the document (SGML/XML) transformation functions, pia provides the application context (agents, transactions, http functions, administration), and content provides the representations and interfaces for the MIME data (html, jpg, gif, etc.). Below is a list of the major improvements planned for each package. More detailed information can be found in Steve's notes .

dps projects
Provide a cleaner interface between the DOM (document object model) and the rest of the dps that makes it easy to work with other DOM implementations
modify BasicProcessor to be driven by a SAX parser
proper support for DTDs and XML namespaces
XSL emulation (XSL stands for extensible style language and uses style sheets for transforming XML documents. The PIA provides a rather different processing model from XSL but it should be fairly straightforward to support the XSL syntax.)
pia projects
modify internal data structures to be XML based (for example, Agents should be representable as XML -- installing an agent should be simply a matter of processing the corresponding agent document)
add XML database support
clarify the order in which agents get to operate on transactions when more than one agent matches
site projects
re-read a subsite's configuration file when it changes. Currently hard, because agents (for example) get created twice.
make it possible to automatically save entities in files when they change.
content projects
the content classes need to be cleaned up and simplified
text/ProcessedContent should be used when agents want to modify HTML/XML documents by providing actions for tags. (Currently HTML modifications of proxied documents are done via a kludge.) Clarify the process by which agents can operate on content objects.
add classes that enable operations on standard mime types (currently most non-text types are handled as ByteStreams which support essentially no operations)

Integration with other software

Currently the PIA provides a complete application environment that includes web server, client, and proxy functions. In many situations, one would like to use a PIA application in conjunction with an existing web site/web server (typically Apache). We are considering several approaches:

PIA as a backend server (e.g. ProxyPass)
Apache (and presumably other Web servers) can map other servers into the local space -- requests for particular resources are sent via HTTP to the other server. From the Apache documentation (slightly modified)
Suppose the local server has address http://wibble.org/; then
   ProxyPass /foo/ http://localhost:8888/foo
will cause a local request for <http://wibble.org/foo/bar> to be internally converted into a proxy request to <http://localhost:8888/foo/bar>.
where we have a PIA listening, as usual, on port 8888. Note that we have successfully used this method. Some care must be taken to ensure that URLs are properly specified -- it works best if the PIA pages all contain relative URLs. Note that the PIA server need not be running on the same machine, but it's the usual case.
One solution is to make the PIA capable of functioning as an Apache module. As with the PIA server functions, Apache modules can access and modify requests at several points in the processing which means that the processing model used for agents in the PIA could remain essentially unchanged. Unfortunately, Apache is not (yet) multithreaded -- each simultaneous request is handled by a separate process. Since the PIA is written in Java, each of these processes would have to be running a Java virtual machine (JVM). Not only does this introduce substantial overhead (not really practical on most systems) but it would make it very difficult to maintain consistent state for individual agents. Note that Apache version 2 is supposed to be multithreaded, at which time a native module might make sense.
Servlets are a standard Java interface for adding resources to Java web servers. There are two ways to map the PIA into servlets, 1) turn each agent into a servlet, or 2) add a servlet front-end to the PIA (replacing the "Acceptor" class). The first approach puts a servlet wrapper around agent objects. The downside of this approach is that only one agent can operate on a given request (e.g. history and proxy agents would not be able to look at or modify requests for other URLs). The second approach is more general, but may result in two separate data structures being created for every request, one for the servlet engine, and one for the PIA. (Even in this case the PI agents may not be able to "see" requests for resources outside of the PIA space.) Note that in either case, creating the servlet wrappers should not be difficult. Once we have a servlet version of the PIA, it can be used with Apache through the JServ servlet engine and mod_jserv. (See the Java Apache project.)

Non-Java implementations of the PIA

While Java has some nice features for specifying clean designs and portability, other languages provide some speed and memory advantages. It should be straightforward to implement the PIA design in alternative language and leverage existing components. One attractive possibility is to create a C/C++ implementation that uses MODSAX as the XML parser, a C implementation of the DOM for internal data structures, and Apache for the server functions. This would mainly leave some of the dps, entity, and agent functions to be ported. More detailed information can be found in Steve's notes .

Another attractive option is to base the port on the SGML style language, DSSSL, which is essentially a set of extensions to Scheme. Also, a previous version of the PIA was implemented in Perl, and it might be possible to revive it.


$Id: roadmap.html,v 1.5 1999/11/19 23:28:00 steve Exp $