PIA Application Author's Guide

This document provides an overview of application writing in the PIA system. In particular it describes:

Warning! Parts of this document are still out of date. Most of those parts are indicated in red; particularly dubious sections are indented as well.

Active pages for sample applications are in the PIA/Agents/ and PIA/Samples/ directories. Feel free to use these files as a source of ideas and examples, and as a basis for creating new applications.

Quick Start

Applications and Agents

PIA stands for ``Platform for Information Applications,'' so we need to start by explaining exactly what an application is and how the user can access it. The short answer is that an application is any location in the PIA's web site that performs a service for the user by means of active documents.

The first time the user visits an application's location, its local configuration file, _subsite.xcf (if it exists) is loaded into the PIA server, which gives the application a chance to perform any necessary setup. This includes making the application's ``home directory'' available at the top of the PIA's URL space as /~AppName.

It is possible to pre-load applications by accessing them in the PIA's top-level initialization file, /initialize.xh.

Agents

There are roughly two kinds of applications: passive and active. Passive applications "just sit there," doing nothing until a request for a page comes in... at which point the processor finds the right page, processes the appropriate .xh file, and sends back the processed results. This is like a traditional static web-page or CGI script.

Active applications, on the other hand, are more like "software agents" in the traditional sense: they are background processes, always on, which eavesdrop on and potentially modify all the port's traffic. For example, the History application (under /Agents/) keeps track of every URL visited, and the remoteTools application modifies incoming HTML according to the user's customization rules. This is like a traditional proxy-server.

Strictly speaking, an Agent is a piece of (usually XML) code that is run in response to some internal event, rather than in response to a direct request for a document. At the moment, agents can respond to either a web transaction (request or response) that matches some ``criteria'', or can be run at a particular time or repetition interval.

It's difficult to draw a clear distinction between agents and applications; any application may potentially start an arbitrary number of agents. In fact, an agent can exist purely to register an application's home directory.

Some history here: originally, all applications in the PIA were associated with agents, and all of their ``home directories'' were accessible at the top level. This persists in the terminology: we still tend to use ``Agent'' and ``Application'' interchangeably, and in fact the line between them is rather fuzzy -- there's no way for a casual user to determine whether or not an application contains an Agent.

The prototype applications in Samples/ are of the passive type; it is best to get accustomed to using these before tinkering with the more subtle and complex active ones. Once you are ready, look at /Agents/Proxie/History as a prototype active application.

Naming Conventions

By convention, most application names are capitalized, but this is not required. Name lookup is case-sensitive, and there may be good reasons to prefer uppercase or lowercase names in some cases. For example, fileTools and remoteTools are really ``sub-applications'' of Toolbar, and so have lowercase names.

It is also conventionally true that the ``home directory'' of an application (i.e. the name starting with /~) has the same name as the application's own home directory, but this is also not universally true. For example, /~Calendar refers to the application directory /Agents/SimpleCalendar -- the shortened form is more generic, and makes it easy to replace SimpleCalendar with something more complex without breaking any links.

See also shorthand application names.

Web Site Structure

Resources and Locations

A running PIA functions as a web server, so that it looks to the user like a web site (or a collection of web sub-sites). The user's view of the PIA is as a a collection of documents, each with its own URL. URL stands for Uniform Resource Locator, so the technical term for something ``addressed'' by a URL is a Resource. The PIA follows this terminology.

URLs form a branching structure called the ``URL tree'' (when its structure is being emphasized) or ``URL space'' (to emphasize the space-like uniformity of URL's). This structure is at least partially hidden from the user by the browser, which lets the user jump around at random by clicking on links. Only if the user glances at the browser's ``location'' text box does the hierarchical nature of URL space become apparent.

The terminology for a resource that behaves like a directory, with other resources contained inside it, is not particularly well-established. Many people use ``Container,'' but ``Location'' is also popular. The PIA actually uses both terms with slightly different meanings: a ``location'' is a position in the URL tree, while a ``container'' is the resource located at that position. Location is effectively a synonym for URL, while a directory in the filesystem is a kind of container.

A resource that is not a container is called a ``document.'' A document has both a location in the URL tree, and a container in the resource tree. Every container has an associated document that is shown when the container's URL is requested. By default this is a generic listing of the container's contents; the default is usually overridden by providing a ``home document'' (called index.html in most web servers). The PIA usually uses the name home.xh for this document.

To make matters slightly more complicated, a PIA's web site is also called a ``Site'', and each container resource within it is called a ``Subsite.'' These terms refer to the PIA's particular implementation of resources, in which several directories (in the filesystem sense) may be overlaid to form a single container. (This is explained in the next subsection.)

The distinctions here are totally invisible to the user, who sends the PIA a URL (resource locator) and gets a document back in response. They are, however, important to the application author, and even more important to the programmer. The application author needs to know about subsites in order to understand the otherwise inexplicable name (_subsite.xcf) of the PIA's configuration files. The programmer will quickly discover that Resource is the name of an interface, while Subsite is the name of one of its implementations, in a language (Java) in which interfaces and implementations have to have different names.

This section needs a little work.

Where is the "current working directory" for an active page? If an active page tries to read the file ../Agents/Foo, where will the processor look? If it tries to write that file, where will it write?

Every application (or "subsite") lives in a directory. In the simplest case, the directory has the same name as the application and is located under PIA_ROOT/. For instance, the example in the Quick-start section above creates the MyFirstPage application in the directory .pia/MyApps/MyFirstPage. Also, the applications we provide are in same-named directories under PIA/Agents and PIA/Samples. Various configuration files (_subsite.xcf) could tell the processor to look for the application in other directories instead.

But in all cases, the application's URL always looks as if the application resides directly under the URL's root (the "slash" after the port number). So, for example, the MyFirstPage URL would be http://piaHOST:8888/MyApps/MyFirstPage, and the HelloWorld URL would be http://piaHOST:8888/Samples/HelloWorld, with no indication that one lives under .pia/ and the other under PIA/.

The procesor has internal rules for deciding where to look for files belonging to given a URL (in this example, the rule is "look in .pia/ before looking in PIA/"). But you can add more rules via the configuration files, so that an application's active pages, tagsets, subdirectories and so forth can reside anywhere; you just need to make sure that a (real) _subsite.xcf file under PIA_ROOT or PIA_HOME tells the processor where the "anywhere" is.

Even the base directories PIA_HOME and PIA_ROOT can be set on the pia command line or in the corresponding environment variables. They are normally the root of the PIA directory tree (called PIA and located wherever the PIA is installed in your system) and a directory called .pia in your home directory.

In fact, reading and writing are somewhat different. All file-writing takes place in the PIA_ROOT directory (e.g. .pia) or where its configuration files point, automatically creating the whole stack of intervening directories if necessary. And the current directory is wherever the calling page (e.g. home.xh) lives, as decided by the processor's interpretation of the various configuration files ( _subsite.xcf, as outlined above).

File-reading, as mentioned before, tries at first to act like file-writing (looking under PIA_ROOT either for the file or for a configuration file pointing to it), and uses the same "current working directory" as file-writing. But (unlike in the writing case), if the processor fails to find the desired file in that fashion, it falls back to looking under PIA_HOME, and searches there for the file or for configuration pointers.

This process can sometimes lead to confusion; be aware that files may not be written or read where you first expect. The least confusing approach is to create and modify all files directly under PIA_ROOT; then everything will be read and written in the same place, and path names will be transparent.

File Names and Access

This section describes the conventions for naming files, and for specifying files inside of active tags.

Filename Extensions

Unlike most web servers, the PIA does not require filename extensions (also called ``suffixes'' or, in Windows, ``filetypes''). This makes it possible to refer to a document without revealing to the user whether it is an ordinary HTML file or is created on-the-fly from an active document. It also makes URL's shorter (and hence easier to remember).

Documents in application directories may be looked up with a different (larger) set of suffixes from files in a data directory; this reflects the fact that active documents are not permitted in the data directory, for security reasons.

A file's extension, as usual, determines its MIME type -- the information the browser needs in order to display the file correctly. But because the PIA processes active documents at the server level, it not only needs to know what is in the file, but also which rules (contained in a so-called ``tagset'' file) to apply in processing that file type. 9You can have more than one tagset in an application's directory, but each file-extension is assigned at most one tagset.)

The mapping from extensions to MIME types and tagsets is usually specified in a file called extensions.xci; however the default mapping can be overridden or extended by any application for its own purposes (see the Samples/ExtensionDemo application, which assigns the local tagset to any file with the .foo suffix).

Ext. MIME type tagset meaning
.xh text/html xhtml Extended HTML
.xx text/xml xxml Extended XML
.html text/html HTML Ordinary HTML
.htm text/html HTML Ordinary HTML
.xml text/xml none Ordinary XML
 
  (used internally)
.xcf hidden pia-config XML Configuration File
.xci hidden pia-config included in a .xcf file.
.inc hidden current ``include'' files
.ts hidden tagset tagset files
.xml hidden various Ordinary XML

The PIA uses a fairly small number of standard files to determine its configuration.

File Location Usage
user-specified.xcf command line top-level configuration file. This overrides any _subsite.xcf file in the top-level directory.
initialize.xh top level PIA initialization.
_subsite.xcf any directory per-directory configuration information and metadata.
appname-xhtml.ts any application conventional name for an application's local tagset file.

Application Files

A PIA application consists of little more than a collection of extended HTML (XHTML) files organized in a directory. Some of these files have conventional names, to make it easier for a user to navigate the resource tree.

home.xh required The ``home page'' for the application. This is usually the first page seen by a user of the application, so it should include links to at least the most commonly used other pages.
DATA/ recommended The conventional name for the sub-directory where the application stores its data. You can, however, use any name you like.
help.xh optional Where the user looks for help. It is possible to configure a virtual link to a default version of help.xh.
about.html optional A good name for a file that gives background information on the application: its goals, philosophy, and so on. Often includes documentation of major design decisions.
to-do.html optional Application-maintainer's ``to-do'' list.
done.html optional Items moved from the ``to-do'' list after being completed.
 
Include files
HEADER.html recommended Appears at the top of a directory listing.
about.inc optional often included by home.xh, especially if an application has multiple entry points.
help.inc optional included by the default version of help.xh.
.xml depends on usage. Ordinary XML

File Access

Locations passed to the PIA in URL's are always looked up in the site resource tree. Relative filenames always have the current base (pathname) prepended by the browser.

Inside the PIA things are somewhat more complex: an application may need to refer to things elsewhere in the file system, including files that aren't part of the PIA's resource tree at all. In particular, this happens in the attributes that designate system resources: ``src'' in the <include>, <connect>, and <status> elements, ``file'' and ``virtual'' attributes in subsite configuration files, and the ``system names'' of entities. The following conventions are used:

  1. Locations without a ``protocol'' prefix (e.g. ``file:'') are treated as URL's, exactly the way the browser would treat them. In other words, they refer to exactly the same resources that would be returned to the user if that URL was requested. Locations without a leading ``/'' (slash) character are looked up relative to the location of the referring document.
  2. The prefixes given in the following table are used to specify locations outside of URL space:

    === THIS TABLE IS A TOTAL LIE!===
    ... but it represents the way I intend to make it work.
    file: A file, using normal filesystem conventions. Relative paths are relative to the directory in which the pia command was given.
    path: A file with its path specified in ``URL format'', with forward slashes. Paths starting with ~/ are relative to the user's home directory.
    pia: A path, in URL format (with forward slashes), relative to the PIA's ``home'' directory $PIA_HOME
    real: A path, in URL format, relative to the ``real root'' directory $PIA_ROOT.
    virtual: A path, in URL format, relative to the ``virtual root'' directory if there is one.

  3. Files passed on the pia command line or in environment variables are in system format, i.e. treated as if they had an implicit ``file:'' prefix.

When the file: prefix is not present, paths are in ``URL format'': slash characters are converted, if necessary, to the operating system's file separator. So ../MyApp/home.xh will be converted (on DOS/Windows) to ..\MyApp\home.xh.

Programming issues

non-programmers can safely skip this section.

Every installed agent has an associated software object which contains its options (stored as entities in the AGENT namespace) and the criteria that match features of the transactions in which the agent has registered interest.

This object is normally an instance of the class GenericAgent. If a subclass of this class is defined in the package org.risource.pia.agent, and its name matches the agent's type (with the first character capitalized, the rest in lowercase, and all period (.) and hyphen (-) characters converted to underscore (_) characters), it is loaded automatically when the agent is installed.

It is sometimes necessary to use a different programming language from Java for part of an application. (For example, PERL is good for text manipulation). The best technique is to put the external code into a CGI script (with a .cgi extension). PERL is a popular choice for a scripting language because it is nearly as ubiquitous as Java. Be warned, though, that not all of its libraries or extensions are available on all systems.

An alternative to CGI scripts is the Java native interface, or Java code that uses the exec method of the java.lang.Runtime class to invoke an operating-system command. These techniques are not likely to be portable. At one point, there was an element <os-command> that did this; it was removed partly for security reasons, and partly to encourage portability. CGI scripts in PERL are more likely to be portable.

Tips on Application Writing

PIA applications and active documents are so new that few conventions have become established for their use and there is considerable room for experimentation. A few rules of thumb have become clear, both for web-page appearance in general and for active-page style in particular:

Uniform Look and Feel

Use ``include'' files and application-specific entities to customize inherited pages. The standard include files currently available are:

The other files that are frequently modified are:

You can not only have many different pages within one application (of course!), but you can have many different tagsets as well. Just make sure that each file-extension type (file1.xh, file2.foo) is assigned its proper tagset.

Quick Reference Up Front

An application's home page is the easiest to access. Additionally, many users have browsers with small screens. Therefore it makes sense to put the most-commonly-used functions, and links to the most-commonly-used pages, as close to the top of an application's home page as possible.

There are two common formats for this. The first uses a single column of links near the right-hand side of the screen. The column just to the left contains a small number of labels. This format is automatically generated by the <subhead> tag; its contents can contain additional two-column table rows.

The second format is sometimes used on the home pages of applications, which typically have many functions. It consists of three columns of icons, or of mixed icons and text.

No Nasty Surprises

It is perfectly possible to put a query string into a link, and so have an ordinary-looking link or icon that is equivalent to submitting a form (our SimpleCalendar and Tutorial applications both do this). But be careful, because forms can carry very general instructions to the PIA server-- some of our prototype appliances included a form for shutting down the system!

Many users experiment, clicking links at random. It's a good idea to make any irreversible actions the result of submitting a form, rather than just following a link. This also gives you a chance to ask the user for confirmation.

You may also want to use a robot to build an index of your PIA, or to identify broken links.

Eventually it will be possible to use standard HTTP authentication to keep unauthorized users away from applications or forms you do not want them to use.

Active Documentation

XHTML forms make it easy to mix controls or forms with their documentation. Thus, it is possible not only to describe a function such as a link to a useful page or a form, but to provide the thing itself and invite the reader to try it out on the spot.

Wherever possible, an application should be self-documenting, including links to whatever documents a user may need. This includes tips on customization as well as advice on how to set the options. A HEADER.html or index.html file in the application's home directory may be useful, since it can provide documentation that the user sees while browsing the application's source code directory before installation. It should include an installation form, especially if an application has many options.

Readable Source

It goes without saying that the output of a XHTML file--the user's view--should be easily read, understood, and used. But so should the input--the author's view. This is especially important in an open-source system like the PIA; you are almost guaranteed that somebody else will be reading your code at some point -- and it's likely that you will be reading theirs!

Appropriate indentation makes for easier-to-read code. Indent the content of elements such as lists and control structures. Comment the file as appropriate, using one of the two types of comment available in the PIA:

Keep maintainers informed

Keep a to-do list. Document your design decisions, including things you tried that turned out to be mistakes.

Use source control (the PIA group uses CVS, which is free, well-supported, and well-suited for projects with multiple developers).

Remember, an application's XHTML files are read not only by their author, but by any users who want to customize or extend them. If the application is complicated, consider writing an ``implementation details'' document.

The conventional names for documentation files in the PIA are:

========== this section is questionable ===============

Directory Structure

Your Information Agency makes use of two parallel directory trees:

By default, an application MyApp in the directory AllApps will read and write its data files into PIA_ROOT/AllApps/MyApp directory.

When the PIA processor looks for documents for (say) the URL PIAhost:8888/MyApps/Foo, it will first look in PIA_ROOT/MyApps/Foo. If that search fails, the application then checks PIA_HOME/MyApps/Foo. Thus the History application searches for its files in ~/.pia/Agents/History and then in /usr/local/bin/pia/Agents/History.

This makes it easy to customize applications we have prototyped; you simply put your newly-improved documents in PIA_ROOT/Agents/AGENT_NAME/FILENAME.xh. For example, a customized form, foo.xh for the History application would go in ~/.pia/Agents/History/foo.xh, and would not risk being overwritten by subsequent History-application updates from RiSource.org.

Entities

Entity variables can be several levels deep. For example, &AGENT:employees; might return the first item in a list of employees associated with this agent. Entities in the AGENT: namespace are shared by all of that agent's documents, and can even be accessed from other agents. (The History agent does this with its toolbar segment, for example.)

Processing Documents Not Specifically Requested

Agents can be used to process documents moving through the agency. For example, the History and remoteTools agents process all proxied documents. Each agent registers a set of criteria for the documents it is interested in. Whenever the agency sees a document that matches an agent's registered criteria, (requests for documents are considered documents in their own right), that agent is given a chance to process the document before sending it on to its destination.

Debugging PIA Applications

As with any software, a PIA-based application is unlikely to work correctly the first time. Here are some techniques for understanding what is happening within your application.

The extended HTML tags needed to create applications are described in the PIA XHTML Manual.


Copyright © 1999 Ricoh Innovations, Inc.. Open Source at <RiSource.org/PIA>.
$Id: author.html,v 1.20 2001-01-11 23:36:45 steve Exp $