This document provides an overview of application writing in the PIA system. In particular it describes:
Warning! Parts of this document are still out of date. Most of those parts are indicated in red; particularly dubious sections are indented as well.
Active pages for sample applications are in the PIA/Agents/
and PIA/Samples/
directories. Feel free to use these files as a
source of ideas and examples, and as a basis for creating new applications.
Browse to the subsites in the Samples/
directory. Find one
upon which you might like to build, and copy that directory and its contents
into the .pia/
(or $PIA_ROOT/
) directory, under a
new name (e.g. cp -r ~/PIA/Samples/HelloWorld/ ~/.pia/MyApps/MyFirstPage
).
Empty and remove all the CVS directories and sub-directories. Now any changes you make will not be confused with files in the original source-tree, nor overwritten upon updating via CVS. And any files your active pages write or read will appear in (or relative to) this new directory.
In this new directory, change every occurrence of the original name
(HelloWorld
) to the new one (MyFirstPage
), both
filenames and file content. This ensures that you don't have any stray
references to the original application, and that any proper configurations and
tagsets are used.
Add some text to the copied home.xh
file, and aim your
browser at the new page
(e.g. http://myHost:8888/MyApps/MyFirstPage/home.xh
)... you
should see the text.
Define a new tag at the very top of home.xh
(as happens
at the top of HelloWorld/home.xh
), and use the tag in the
body. Refresh the broswer page (shift-Reload
) to see the
effects. If you expect to create many such tags, or to use them in other
pages, you should move them into a tagset file (see--or copy!-- the
TagsetDemo/
sample to see how).
Add a <show-errors />
tag to your page to indicate
any errors which may occur. Try the <pretty />
tag to
see the structure of your XML. Check out the hackable tutorials
for more explanation of how such tags work. Play around!
PIA stands for ``Platform for Information Applications,'' so we need to start by explaining exactly what an application is and how the user can access it. The short answer is that an application is any location in the PIA's web site that performs a service for the user by means of active documents.
The first time the user visits an application's location, its local
configuration file, _subsite.xcf
(if it exists) is loaded
into the PIA server, which gives the application a chance to perform any
necessary setup. This includes making the application's ``home
directory'' available at the top of the PIA's URL space as
/~AppName
.
It is possible to pre-load applications by accessing them in the PIA's
top-level initialization file, /initialize.xh
.
There are roughly two kinds of applications: passive and active. Passive applications "just sit there," doing nothing until a request for a page comes in... at which point the processor finds the right page, processes the appropriate .xh file, and sends back the processed results. This is like a traditional static web-page or CGI script.
Active applications, on the other hand, are more like "software agents" in the traditional sense: they are background processes, always on, which eavesdrop on and potentially modify all the port's traffic. For example, the History application (under /Agents/) keeps track of every URL visited, and the remoteTools application modifies incoming HTML according to the user's customization rules. This is like a traditional proxy-server.
Strictly speaking, an Agent is a piece of (usually XML) code that is run in response to some internal event, rather than in response to a direct request for a document. At the moment, agents can respond to either a web transaction (request or response) that matches some ``criteria'', or can be run at a particular time or repetition interval.
It's difficult to draw a clear distinction between agents and applications; any application may potentially start an arbitrary number of agents. In fact, an agent can exist purely to register an application's home directory.
Some history here: originally, all applications in the PIA were associated with agents, and all of their ``home directories'' were accessible at the top level. This persists in the terminology: we still tend to use ``Agent'' and ``Application'' interchangeably, and in fact the line between them is rather fuzzy -- there's no way for a casual user to determine whether or not an application contains an Agent.
The prototype applications in Samples/ are of the passive type; it is best to get accustomed to using these before tinkering with the more subtle and complex active ones. Once you are ready, look at /Agents/Proxie/History as a prototype active application.
By convention, most application names are capitalized, but this is not
required. Name lookup is case-sensitive, and there may be good reasons to
prefer uppercase or lowercase names in some cases. For example,
fileTools
and remoteTools
are really
``sub-applications'' of Toolbar
, and so have lowercase names.
It is also conventionally true that the ``home directory'' of an
application (i.e. the name starting with /~
) has the same
name as the application's own home directory, but this is also
not universally true. For example, /~Calendar
refers to the
application directory /Agents/SimpleCalendar
-- the shortened
form is more generic, and makes it easy to replace
SimpleCalendar
with something more complex without breaking
any links.
See also shorthand application names.
A running PIA functions as a web server, so that it looks to the user like a web site (or a collection of web sub-sites). The user's view of the PIA is as a a collection of documents, each with its own URL. URL stands for Uniform Resource Locator, so the technical term for something ``addressed'' by a URL is a Resource. The PIA follows this terminology.
URLs form a branching structure called the ``URL tree'' (when its structure is being emphasized) or ``URL space'' (to emphasize the space-like uniformity of URL's). This structure is at least partially hidden from the user by the browser, which lets the user jump around at random by clicking on links. Only if the user glances at the browser's ``location'' text box does the hierarchical nature of URL space become apparent.
The terminology for a resource that behaves like a directory, with other resources contained inside it, is not particularly well-established. Many people use ``Container,'' but ``Location'' is also popular. The PIA actually uses both terms with slightly different meanings: a ``location'' is a position in the URL tree, while a ``container'' is the resource located at that position. Location is effectively a synonym for URL, while a directory in the filesystem is a kind of container.
A resource that is not a container is called a ``document.'' A
document has both a location in the URL tree, and a container in the
resource tree. Every container has an associated document that is shown
when the container's URL is requested. By default this is a generic
listing of the container's contents; the default is usually overridden by
providing a ``home document'' (called index.html
in most web
servers). The PIA usually uses the name home.xh
for this
document.
To make matters slightly more complicated, a PIA's web site is also called a ``Site'', and each container resource within it is called a ``Subsite.'' These terms refer to the PIA's particular implementation of resources, in which several directories (in the filesystem sense) may be overlaid to form a single container. (This is explained in the next subsection.)
The distinctions here are totally invisible to the user, who sends the PIA
a URL (resource locator) and gets a document back in response. They are,
however, important to the application author, and even more important to
the programmer. The application author needs to know about subsites in
order to understand the otherwise inexplicable name
(_subsite.xcf
) of the PIA's configuration files. The
programmer will quickly discover that Resource
is the name of
an interface, while Subsite
is the name of one of
its implementations, in a language (Java) in which interfaces and
implementations have to have different names.
Where is the "current working directory" for an active page? If an active page tries to read the file ../Agents/Foo, where will the processor look? If it tries to write that file, where will it write?
Every application (or "subsite") lives in a directory. In the simplest
case, the directory has the same name as the application and is located
under PIA_ROOT/. For instance, the example in the Quick-start
section above creates the MyFirstPage
application in the
directory .pia/MyApps/MyFirstPage. Also, the applications we
provide are in same-named directories under PIA/Agents and
PIA/Samples. Various configuration files
(_subsite.xcf) could tell the processor to look for the
application in other directories instead.
But in all cases, the application's URL always looks as if the
application resides directly under the URL's root (the "slash" after the
port number). So, for example, the MyFirstPage
URL would be
http://piaHOST:8888/MyApps/MyFirstPage, and the
HelloWorld
URL would be
http://piaHOST:8888/Samples/HelloWorld, with no indication that
one lives under .pia/ and the other under PIA/.
The procesor has internal rules for deciding where to look for files belonging to given a URL (in this example, the rule is "look in .pia/ before looking in PIA/"). But you can add more rules via the configuration files, so that an application's active pages, tagsets, subdirectories and so forth can reside anywhere; you just need to make sure that a (real) _subsite.xcf file under PIA_ROOT or PIA_HOME tells the processor where the "anywhere" is.
Even the base directories PIA_HOME and PIA_ROOT can
be set on the pia
command line or in the corresponding
environment variables. They are normally the root of the PIA directory
tree (called PIA and located wherever the PIA is installed in your system)
and a directory called .pia
in your home directory.
In fact, reading and writing are somewhat different. All
file-writing takes place in the PIA_ROOT directory
(e.g. .pia) or where its configuration files point,
automatically creating the whole stack of intervening directories if
necessary. And the current directory is wherever the calling page
(e.g. home.xh
) lives, as decided by the processor's
interpretation of the various configuration files (
_subsite.xcf, as outlined above).
File-reading, as mentioned before, tries at first to act like file-writing (looking under PIA_ROOT either for the file or for a configuration file pointing to it), and uses the same "current working directory" as file-writing. But (unlike in the writing case), if the processor fails to find the desired file in that fashion, it falls back to looking under PIA_HOME, and searches there for the file or for configuration pointers.
This process can sometimes lead to confusion; be aware that files may not be written or read where you first expect. The least confusing approach is to create and modify all files directly under PIA_ROOT; then everything will be read and written in the same place, and path names will be transparent.
This section describes the conventions for naming files, and for specifying files inside of active tags.
Unlike most web servers, the PIA does not require filename extensions (also called ``suffixes'' or, in Windows, ``filetypes''). This makes it possible to refer to a document without revealing to the user whether it is an ordinary HTML file or is created on-the-fly from an active document. It also makes URL's shorter (and hence easier to remember).
Documents in application directories may be looked up with a different (larger) set of suffixes from files in a data directory; this reflects the fact that active documents are not permitted in the data directory, for security reasons.
A file's extension, as usual, determines its MIME type -- the information the browser needs in order to display the file correctly. But because the PIA processes active documents at the server level, it not only needs to know what is in the file, but also which rules (contained in a so-called ``tagset'' file) to apply in processing that file type. 9You can have more than one tagset in an application's directory, but each file-extension is assigned at most one tagset.)
The mapping from extensions to MIME types and tagsets is usually specified
in a file called extensions.xci
; however the default mapping
can be overridden or extended by any application for its own purposes
(see the Samples/ExtensionDemo application, which assigns the
local tagset to any file with the .foo suffix).
Ext. | MIME type | tagset | meaning |
---|---|---|---|
.xh | text/html | xhtml | Extended HTML |
.xx | text/xml | xxml | Extended XML |
.html | text/html | HTML | Ordinary HTML |
.htm | text/html | HTML | Ordinary HTML |
.xml | text/xml | none | Ordinary XML |
(used internally) | |||
.xcf | hidden | pia-config | XML Configuration File |
.xci | hidden | pia-config | included in a .xcf file. |
.inc | hidden | current | ``include'' files |
.ts | hidden | tagset | tagset files |
.xml | hidden | various | Ordinary XML |
The PIA uses a fairly small number of standard files to determine its configuration.
File | Location | Usage |
---|---|---|
user-specified.xcf
|
command line | top-level configuration file. This overrides any
_subsite.xcf file in the top-level directory.
|
initialize.xh
|
top level | PIA initialization. |
_subsite.xcf
|
any directory | per-directory configuration information and metadata. |
appname-xhtml.ts
|
any application | conventional name for an application's local tagset file. |
A PIA application consists of little more than a collection of extended HTML (XHTML) files organized in a directory. Some of these files have conventional names, to make it easier for a user to navigate the resource tree.
home.xh
|
required | The ``home page'' for the application. This is usually the first page seen by a user of the application, so it should include links to at least the most commonly used other pages. |
DATA/
|
recommended | The conventional name for the sub-directory where the application stores its data. You can, however, use any name you like. |
help.xh
|
optional | Where the user looks for help. It is possible to configure a
virtual link to a default version of help.xh .
|
about.html
|
optional | A good name for a file that gives background information on the application: its goals, philosophy, and so on. Often includes documentation of major design decisions. |
to-do.html
|
optional | Application-maintainer's ``to-do'' list. |
done.html
|
optional | Items moved from the ``to-do'' list after being completed. |
Include files | ||
---|---|---|
HEADER.html
|
recommended | Appears at the top of a directory listing. |
about.inc
|
optional | often included by home.xh , especially if an
application has multiple entry points.
|
help.inc
|
optional | included by the default version of help.xh .
|
.xml | depends on usage. | Ordinary XML |
Locations passed to the PIA in URL's are always looked up in the site resource tree. Relative filenames always have the current base (pathname) prepended by the browser.
Inside the PIA things are somewhat more complex: an application may need
to refer to things elsewhere in the file system, including files that
aren't part of the PIA's resource tree at all. In particular, this
happens in the attributes that designate system resources:
``src
'' in the <include>, <connect>, and
<status> elements, ``file
'' and
``virtual
'' attributes in subsite configuration files, and
the ``system names'' of entities. The following conventions are used:
file:
'')
are treated as URL's, exactly the way the browser would treat them. In
other words, they refer to exactly the same resources that would be
returned to the user if that URL was requested. Locations without a
leading ``/
'' (slash) character are looked up relative to
the location of the referring document.
=== THIS TABLE IS A TOTAL LIE!===
... but it represents the way I intend to make it work.
file: |
A file, using normal filesystem conventions. Relative
paths are relative to the directory in which the
pia command was given.
|
path: |
A file with its path specified in ``URL
format'', with forward slashes. Paths starting with
~/ are relative to the user's home directory.
|
pia: |
A path, in URL format (with forward slashes),
relative to the PIA's ``home'' directory
$PIA_HOME
|
real: |
A path, in URL format, relative to the ``real
root'' directory $PIA_ROOT .
|
virtual: |
A path, in URL format, relative to the ``virtual root'' directory if there is one. |
pia
command line or in environment
variables are in system format, i.e. treated as if they had an implicit
``file:
'' prefix.
When the file:
prefix is not present, paths are in ``URL
format'': slash characters are converted, if necessary, to the
operating system's file separator. So ../MyApp/home.xh will be
converted (on DOS/Windows) to ..\MyApp\home.xh.
non-programmers can safely skip this section.
Every installed agent has an associated software object which
contains its options (stored as entities in the AGENT
namespace) and the criteria that match features of the transactions
in which the agent has registered interest.
This object is normally an instance of the class
GenericAgent
. If a subclass of this class is defined in the
package org.risource.pia.agent
, and its name matches the
agent's type (with the first character capitalized, the rest in lowercase,
and all period (.
) and hyphen (-
) characters
converted to underscore (_
) characters), it is loaded
automatically when the agent is installed.
It is sometimes necessary to use a different programming language from
Java for part of an application. (For example, PERL is good for text
manipulation). The best technique is to put the external code into a CGI
script (with a .cgi
extension). PERL is a popular choice for
a scripting language because it is nearly as ubiquitous as Java. Be
warned, though, that not all of its libraries or extensions are available
on all systems.
An alternative to CGI scripts is the Java native interface, or Java code
that uses the exec
method of the
java.lang.Runtime
class to invoke an operating-system
command. These techniques are not likely to be portable. At one point,
there was an element <os-command> that did this; it was removed
partly for security reasons, and partly to encourage portability. CGI
scripts in PERL are more likely to be portable.
PIA applications and active documents are so new that few conventions have become established for their use and there is considerable room for experimentation. A few rules of thumb have become clear, both for web-page appearance in general and for active-page style in particular:
Use ``include'' files and application-specific entities to customize
inherited pages. The standard include
files currently
available are:
about.inc
home.xh
file just underneath the quick reference generated by the
<subhead>
element.insert.inc
The other files that are frequently modified are:
home.xh
<subhead>
element, for example.
Application-xhtml.ts
<header>
and <subhead>
to
be defined (redefining <subhead>
is not uncommon),
as well as tags that are unique to the application. file1.xh, file2.foo
) is assigned
its proper tagset.
An application's home page is the easiest to access. Additionally, many users have browsers with small screens. Therefore it makes sense to put the most-commonly-used functions, and links to the most-commonly-used pages, as close to the top of an application's home page as possible.
There are two common formats for this. The first uses a single
column of links near the right-hand side of the screen. The column
just to the left contains a small number of labels. This format is
automatically generated by the <subhead>
tag; its contents
can contain additional two-column table rows.
The second format is sometimes used on the home pages of applications, which typically have many functions. It consists of three columns of icons, or of mixed icons and text.
It is perfectly possible to put a query string into a link, and so have an ordinary-looking link or icon that is equivalent to submitting a form (our SimpleCalendar and Tutorial applications both do this). But be careful, because forms can carry very general instructions to the PIA server-- some of our prototype appliances included a form for shutting down the system!
Many users experiment, clicking links at random. It's a good idea to make any irreversible actions the result of submitting a form, rather than just following a link. This also gives you a chance to ask the user for confirmation.
You may also want to use a robot to build an index of your PIA, or to identify broken links.
Eventually it will be possible to use standard HTTP authentication to keep unauthorized users away from applications or forms you do not want them to use.
XHTML forms make it easy to mix controls or forms with their documentation. Thus, it is possible not only to describe a function such as a link to a useful page or a form, but to provide the thing itself and invite the reader to try it out on the spot.
Wherever possible, an application should be self-documenting,
including links to whatever documents a user may need. This
includes tips on customization as well as advice on how to set the
options. A HEADER.html
or index.html
file
in the application's home directory may be useful, since it can provide
documentation that the user sees while browsing the application's source
code directory before installation. It should include an
installation form, especially if an application has many options.
It goes without saying that the output of a XHTML file--the user's view--should be easily read, understood, and used. But so should the input--the author's view. This is especially important in an open-source system like the PIA; you are almost guaranteed that somebody else will be reading your code at some point -- and it's likely that you will be reading theirs!
Appropriate indentation makes for easier-to-read code. Indent the content of elements such as lists and control structures. Comment the file as appropriate, using one of the two types of comment available in the PIA:
The ordinary HTML comment <!-- like this -->
is
passed on by the PIA processor to the browser, so it can appear in the
browser's "Page Source" window. Unfortunately, this kind of comment
can affect the processing of active tags (because it still exists, even
if invisibly, during processing), and can lead to very puzzling errors
(for example, you may be testing whether a list is empty, and finding
that it is not empty because it contains such a comment).
The other type of comment is more like a traditional "programming
comment", and is ideal for describing what happens in an active page.
It looks <?-- like this --?>
; any such comments are
removed at the very first PIA processing step, and thus have no side
effects at all. We encourage copious use of this style comment to
clarify your active pages, just as programmers comment their
code.
It is important to note that there must be a space after the
``?--
''! Otherwise the first word of the comment will
be taken as part of the name of an XML processing instruction, which
will not be recognized as a comment and will be passed on to
the browser, which in turn will display it to the user, usually in a
particularly ugly format.
Keep a to-do list. Document your design decisions, including things you tried that turned out to be mistakes.
Use source control (the PIA group uses CVS, which is free, well-supported, and well-suited for projects with multiple developers).
Remember, an application's XHTML files are read not only by their author, but by any users who want to customize or extend them. If the application is complicated, consider writing an ``implementation details'' document.
The conventional names for documentation files in the PIA are:
to-do.html
done.html
about.html
========== this section is questionable ===============Directory Structure
Your Information Agency makes use of two parallel directory trees:
- PIA_HOME contains the code, documentation, applications and other files released by the PIA group. This directory can be overwritten by
cvs update
, and may be shared by many users on a common filesystem (e.g. in /usr/local/).- PIA_ROOT (typically ~/.pia on Unix) stores application data and customized active documents. This directory is always the first place the PIA processor looks for files; it only checks PIA_HOME if it can't find them here.
By default, an application MyApp in the directory AllApps will read and write its data files into PIA_ROOT/AllApps/MyApp directory.
When the PIA processor looks for documents for (say) the URL
, it will first look in PIAhost:8888/MyApps/Foo
PIA_ROOT/MyApps/Foo
. If that search fails, the application then checksPIA_HOME/MyApps/Foo
. Thus the History application searches for its files in ~/.pia/Agents/History and then in /usr/local/bin/pia/Agents/History.This makes it easy to customize applications we have prototyped; you simply put your newly-improved documents in
PIA_ROOT/Agents/AGENT_NAME/FILENAME.xh
. For example, a customized form, foo.xh for the History application would go in ~/.pia/Agents/History/foo.xh, and would not risk being overwritten by subsequent History-application updates from RiSource.org.Entities
Entity variables can be several levels deep. For example, &AGENT:employees; might return the first item in a list of employees associated with this agent. Entities in the
AGENT:
namespace are shared by all of that agent's documents, and can even be accessed from other agents. (The History agent does this with its toolbar segment, for example.)Processing Documents Not Specifically Requested
Agents can be used to process documents moving through the agency. For example, the History and remoteTools agents process all proxied documents. Each agent registers a set of criteria for the documents it is interested in. Whenever the agency sees a document that matches an agent's registered criteria, (requests for documents are considered documents in their own right), that agent is given a chance to process the document before sending it on to its destination.
As with any software, a PIA-based application is unlikely to work correctly the first time. Here are some techniques for understanding what is happening within your application.
&urlQuery;
appears in the
home.xh
page for the Samples/Form
application, so you can see something like following after submitting
the form: data=hi+there
. PIA/bin/pia
,
and where you subsequently re-run it whenever you change
configurations). Sometimes useful messages appear there. <user-message>
&myValues;<user-message>
myValues
into the command console running the
PIA; this allows you to view all kinds of intermediate variables
without having to change the expansion/output properties of the tags or
active sections you are investigating. -d
(debug) option
(e.g. PIA/bin/pia -d
). This will produce voluminous
output about what the PIA thinks it's doing. Adding the
-v
(verbose) flag will produce even more output, tracing
the progress of the document processing system. The extended HTML tags needed to create applications are described in the PIA XHTML Manual.