the PIA


Frequently Asked Questions about the PIA

What is the Platform for Information Applications (PIA)?

The PIA is a framework for building information applications -- systems that embed a small amount of processing in a large amount of information. In the PIA framework, each application consists of two parts:

  1. A collection of task-specific XML pages which contain active-processing tags (<if>, <include>, <repeat>,, etc); These look very much like HTML pages.
  2. A core software layer (including a dynamic XML processor) that is shared by all applications

The goal is to make active pages which are both easier to customize and more powerful than ordinary web pages.

What does the PIA language do?

Check out the live hackable demos/tutorials for a hands-on experience of how active documents operate and how easy they are to modify. No downloading required; you'll be up and running in less than a minute.

Broadly speaking, the PIA does three kinds of things:

  • Act as a web server, reading and processing active-XML documents and serving up HTML for view.
  • Act as a browser proxy, intercepting incoming HTML pages and modifying them (e.g. deleting ads or filling-in forms) and showing you the modified page.
  • Act as a document processor, processing active-XML documents into HTML or XML files.

In each case, the PIA processes a markup document to produce another markup document, based on the particular active tags ("tagset") associated with the source document.

Why should I use the PIA framework instead of CGI, Perl, python, PHP3, JSP, ASP, etc?

Simplicity Doing your processing with HTML-like tags inside an existing document is conceptually and practically simpler than learning two separate languages--one for markup, one for "programming"--and maintaining two separate groups of files, one of which (e.g. CGI scripts) must generate and synchronize the other.

Since the entire application -- the content, layout, and processing of the information -- is represented in XML, updating applications requires only a single set of tools and the widespread ability to work with markup-language documents like HTML. With conventional web pages, even a simple change (such as adding a signature box to a form) can require substantial programming changes in many files. The PIA framework enables the application designer to use (and create) special-purpose elements ("tags"), so that adding a signature box could be as easy as inserting a single <signature/> tag.

Modularity Every PIA document is an XML document, so it can contain enough internal labels to be re-targeted for multiple purposes (e.g. one "page" of XML documentation can contain both brief and verbose versions, the print version, the Web version, the internal version, etc.). And documents can contain each other, or contain pieces of each other, to any depth you want, including all the processing, calculating, and decision-making associated with each one.

Robustness Any document which obeys the rules of XML (about "tag-nesting" and so forth), as all PIA documents must, is guaranteed to obey the corresponding rules of HTML, so that there is much less danger of creating a "buggy" HTML document with this approach than with CGI-like programming approaches.

Power The PIA framework is much more than just a new kind of dynamic XML server---it provides a complete, self-consistent framework for document processing applications and a new approach to computing on the web. Using the PIA, other web documents become resources that can be dynamically processed in the context of a given information application.

For example, it is trivial to create an active XML-page that retrieves a URL and adds a "download me" check box next to each link on the page, so lots of links can be downloaded at once -- just redefine the <a> tag to contain a checkbox too. (See the hackable demo/tutorial pages for many such illustrations.) The bottom-line is that many desirable features require substantially less effort and maintenance using the PIA framework.

Security Because specific processing rules (a "tagset") are associated with each document, the site administrator can tightly restrict a document's actions. For example, our demo/tutorial pages use a tagset in which the <output> and <connect> tags have been disabled, so that visitors hacking on demo documents cannot write onto our file system.

What are the disadvantages?

At present, page-generation by the PIA is slow compared to a compiled CGI-script, so the PIA is more appropriate for office-level intranets and static web-page updates than for high-traffic web sites. (That slowness results from using Java for the initial implementation, not from any intrinsic limitations on the active-tag language itself).

And the active-tag language of PIA processing can seem unusual to traditional C/C++/Perl programmers; it can be downright cumbersome and cluttered for some elaborate processing tasks. So the PIA approach is definitely not appropriate for heavy-duty or intricate computations.

What kinds of applications does the PIA have?

We have rapidly developed and deployed several different kinds of information applications, including a workflow system, a scheduling system, photo albums, discussion lists. These applications were created very quickly, and are maintained without conventional programming by the people who use them.

For example, at Ricoh Innovations all the requests and authorizations for purchasing, expenses, travel, and all other administrative functions are handled electronically using a PIA forms system built by a summer student in a few weeks. Now our office manager can directly modify the forms, update the routing for authorizations, etc., without needing IT consultants or programming expertise.

Does the PIA provide a drag-and-drop GUI for creating information applications?

Not yet, although we now have active tags (like <pretty>) which emulate many of the functions of structured XML or HTML editors. The whole point is that people should still use their favorite text or HTML editors to edit markup documents in a familiar way. The PIA doesn't require any new user interfaces, it just provides more ease-of-use.

What do I need to actually customize these active pages?

The active XML pages of an application are plain text files that can be modified with any text editor that doesn't hide unknown tags (Emacs works very well for us).

Can legacy applications be integrated into the PIA framework

Usually yes.One simple example is adding tags that access an existing database through the JDBC interface. in order to populate some piece of the XML. One could imagine a special-purpose tag called "find-userid," which would be invoked as:

<find-userid name="&FORM:username;" />

to return a user ID from a database, given the user's name.

One way is to extend the set of elements (tags) that are used in a collection of XML pages to include the functions provided by legacy applications. The actions associated with these elements can be defined in terms of other elements (somewhat similar to defining macros in spreadsheets), or they can be specified as a Java class that implements a particular interface. This makes it very easy to create wrapper(s) for legacy systems that incorporate existing functions into PIA applications.

When the PIA is acting as a Web server, all of the the URL's that used to be handled by CGI's could still be handled by those same scripts; replacing existing CGI scripts or servelets is strictly optional. (If necessary, the PIA could get at the data simply through an HTTP request, in which the data's CGI origins becomes irrelevant.)

What do I need to run the PIA software?

You need Java (a Java virtual machine or JRE) and a suitable operating system (e.g. Linux, NT, etc.) See the installation page for details.

What do I need in order to modify the processing engine itself?

Programmers wishing to modify the core software will need a Java Development Kit JDK. To use our Makefiles, you will need GNU make to build the class files and documentation. (Other versions of make may work, or the Java compiler can be run directly.)

Where can I obtain the PIA files?

The PIA files are available via HTTP, from http://www.risource.org/PIA/

The latest stable releases are maintained in compressed form in the directories ftp://ftp.risource.org/PIA/pia_src.tgz.

How do I decompress/install/run/get-started with the PIA?

See the installation page for details.

In short, uncompress the files (on Linux tar -xzf pia_src.tgz) then run the software (Linux: PIA/bin/pia or Windows: PIA/bin/pia.bat) then use your browser to interact with the PIA ( http://LOCALHOST:8888/ where LOCALHOST is the name of the machine where the PIA software is running.)

How are the directories set up?

The PIA can operate on several groups of interacting active XML pages, each group roughly analagous to the mixture of HTML and CGI files one would find at a traditional "web site." We call such a group a "sub-site," and each one has its own active properties (tagsets), its own datafiles, its own configuration options, and its own ways to look up its components. So a single PIA server can effectively host many completely independent subsites, each of which looks like a real site in its own right.

Each subsite directory also contains the code for ongoing PIA processes, like Proxie, which listen to and modify information flowing through the PIA's port (we used to call these semi-autonomous processes "Agents").

There is one wrinkle: how do you modify (or personalize) the PIA and subsite files to your own purposes, without either overwriting the original PIA files or letting the latest PIA-source updates overwrite your changes? To solve this problem we really provide two PIA root directories: one with the original out-of-the-box PIA code (let's call it PIA/Prototype), and one for your own modifications (let's call it .pia). The PIA server will always (and only) look in your custom-edited directory .pia, but at least initially that directory and its subsite directories will only contain configuration files (subsite.cfg) which all point to the workhorse PIA/Prototype directory.

As you customize the PIA, you may then modify the various subsite.cfg files to point elsewhere, to whatever files and directories you want; different subsites can even share files and directories by this method. And because the PIA processor always and only looks at the .pia directory, there are no search-paths to maintain or cause confusion.

How do I tell the PIA to use a particular proxy/port/etc.?

The Config directory contains configuration information stored in the form of property files. Look here to permanently change such things as the ports and proxy information.

How do I tell my browser to use the PIA as a proxy?

Find the preferences or options menu for your browser (under edit->preferences for Netscape 4 view->options under Internet Explorer) and look for the proxy setting. In the slot for the HTTP proxy, enter the name of the machine and port number you are running the PIA software.

The directory PIA/src/java/ contains the Java source files and tagsets used in the PIA software. NOTE: if you set your classpath variable by hand, set it to PIA/src/java/ and not PIA/src/. Implementations in languages other than Java of the PIA software are possible and go in the PIA/src/ directory (perhaps someday PIA/src/perl


PIA/src/java/org/risource/ includes the PIA specific code. PIA/src/org and src/misc contain other code referenced by the PIA software and included for convenience.

The directory src/app contains utility applications.

Can I use the PIA software as a Unix filter for processing HTML and XML documents?

The PIA/bin/process command allows the PIA software to be used as a Unix filter for processing HTML and XML documents; we use it to convert active XML documents into static HTML pages, like this one.

If you use the <pretty nomarkup="yes"> tag, you can even have the processor create the proper indentation and end-tags automatically, for converting HTML into strict XML.

What is the funny line of characters that appears at the top of my browser window?

If you set the proxy of your browser to use the PIA, various PIA processes can modify web pages as you download them from a server. By default the Proxie process (or agent) inserts a toolbar that allows you to manipulate the page in various ways and gives access to some of its features. To disable these funny characters, remove the Toolbar agent. (See the History (or Proxie) agents homepage (e.g. http://LOCALHOST:8888/Proxie/home) for more details).

How can I check for known bugs?

You can check for known bugs by going to http://www.risource.org/PIA/

I found a bug, who do I tell about it?


I fixed a bug, where do I send the patch file?

Patch files can be posted to PIA-dev@risource.org (Note an external CVS server will be available soon to make it easy to keep track of updates.)

Where should I post comments/questions?

Comments and questions about the core software are welcome atPIA-dev@risource.org. Comments and questions about using the PIA framework should go to PIA-use@risource.org. These questions (and responses) will be used to extend this FAQ.