Release Notes for PIA Release 2.0

Release 2.0 is the first external open source release of the PIA. It is intended for developers working on the PIA internals. This is NOT a stable release: the system does work, but there are still bugs, broken links, and other anomalies. Release 2.1 will be the first stable release.

We suggest that people who wish to use the PIA to develop applications but are not interested in being on the cutting edge wait for 2.1, but we believe that there are unlikely to be any major changes between the current release (2.0.8) and Release 2.1.

Subscribe to PIA-announce@RiSource.org to be notified of new releases.

Release Announcements

Release 2.0.9

Major Issues

As a consequence of the changes listed below, there are some compatibility issues in this release:

The default for string matching and attribute lookup is to be case-sensitive. Tags that do string matching must be passed a false case attribute in order to be case-insensitive.
We are compiling with JDK 1.2.2 these days, so the documentation in /Doc/API/javadoc has a different organization.

It is expected that no further user-visible changes will be made between now and Release 2.1, which is expected by March 1. A few items may, time permitting, be added.

Changes since 2.0.8

The default for string matching and attribute lookup is to be case-sensitive. Tags that do string matching must be passed a false case attribute in order to be case-insensitive. Suitable false values are: "", "0", "no", "false", and "insensitive". Actually, any value starting with "i" will work.
At this point, all tags that take an attribute that defines the operation to be performed (e.g. <numeric add="add"> now take an op="operation" attribute as a preferred alternative. It is considered unlikely that the old form will go away, however, because it's a better fit for embedding in legacy HTML files. We debated calling the operation attribute of <test> ``test'', but in the end decided on op for consistency.
Similar ``multiple-choice'' attributes have been provided for other functions, all with an eye toward making XML code more readable. Examples include syntax in <define>.
The tsdoc.ts and slides.ts tagsets have been defined as providing the default processing for files with .ts and .slides extensions, respectively. This means that tagset documentation can be viewed directly, and slide previewing becomes simpler.
The slides.ts tagset has been debugged and updated. When viewing slides through the PIA it is possible to pass a query string with slide=starting-slide and n=number-of-slides. Links are handled correctly, and processing is surprisingly fast.
Values of attributes and entities are now stored in the child nodes, as required by the current version of the DOM.
The Emacs extensions, in /Contrib/rsv.ricoh.com/html-helper-mode/, have been updated from the latest available version, and hacked for XHTML syntax. Support for tables and PIA tags has also been added.
The Ticker sample application has been moved from /Agents to /Samples.

Release 2.0.8

As promised, this release includes a list of things that ought to be fixed before Release 2.1; it can be found in r2.1.to-do.html .

Major Issues

As a consequence of the changes listed below, there are some compatibility issues in this release:

Most importantly, all references to the tagset pia-xhtml must be changed to refer to /Tagsets/pia-xhtml -- otherwise the tagset won't be found. If you have a tagset called pia-xhtml in your PIA user directory (.pia on Linux) that extends the system one, you will get an "out of memory" exception if you don't update its parent attribute.
If you're not using one of the wrapper scripts (PIA/bin/pia or PIA/bin/pia.bat) you should use the -home option to pass the PIA's home directory on the command line. You may also need to put PIA/lib into your CLASSPATH.

Changes since 2.0.7

The major change since 2.0.7 is a new location for tagset definitions, or rather a new pair of locations. Tagsets that are essential for the functioning of the PIA or the process command are now in the PIA/lib/ directory. Tagsets that are only used in the default configuration of the PIA server, and that are expected to be customized for new applications, are in PIA/Tagsets/.
The site package now supports ``prefixes'' (for example ``file:'', ``root:'' and ``pia:'') on pathnames.
The XHTML files in the PIA are now proper parsed external entities; the incorrect DTD's they once contained have been replaced by <make> elements that output a DTD. Note that XHTML files are not complete XML documents. This is a deliberate design choice: it means that any appropriate DTD can be provided separately.
A new agent/application has been added: /Agents/Proxie/buster/. This handy application ``busts'' selected HTTP requests by redirecting them to someplace innocuous. It is shipped disabled by default, because it can also do odd things on pages that use Javascript to process information from their advertisers.

Release 2.0.7 (Storm Warnings!)

This release incorporates the changes that were foreshadowed in 2.0.6. If you have been developing agents, they will need to be fixed!

We expect to be moving very quickly toward a 2.1 release after this; if you know of any show-stoppers this would be a good time to mention them. The plan from here:

2.0.8 -- in a week or two:
expect a prioritized list of things to be fixed before 2.1.
2.0.9 -- feature freeze:
Only (major) bug fixes between here and 2.1.

Changes since 2.0.6

The PIA has gone from having essentially two major components (the pia server engine and the dps Document Processing System) to three, adding the site Site Resource Package. You can get a summary and overview by reading:
1. About the PIA Core Engine in the org.risource.pia package.
2. About the Site Resource Package in the org.risource.site package.
3. Notes on Naming in Steve's notes
The main change, and it's a big one, is that Agents no longer play a role in mapping URL's onto files. The Site Resource Package does that now. Although each Agent still has a ``home directory'', accessible as ~name, this is the only directory associated with the agent. Its data directory is simply a subdirectory of its home directory.
The old hack whereby a URL like /Agent got you a home page while /Agent/ (with a trailing slash) got you a generic index is gone. The generic index is still around, but it has its own name now. The trailing slash on directory names is back to being optional, and just as with most servers you will save yourself a redirection if you use it.
The DOFS agent is gone. In its place is a much simpler, but far more flexible, technique that allows symbolic links (aliases) to be positioned anywhere in the URL tree. These links make it much easier to incorporate pages, and agents, from elsewhere in the filesystem into a working PIA. There is no longer any need to develop agents or applications in your $HOME/.pia directory and later move them to PIA/Agents.
The other agents that were nothing but placeholders in URL space are also gone, since there is no need for a top-level directory to correspond to an agent.
The default top-level directory for a PIA-based server is the PIA directory itself, which eliminates a lot of aliasing and other problems. Of course, nothing prevents a PIA-based application from having a totally different root directory; that's also a lot simpler to do than it used to be. There is a mechanism that allows the entire PIA directory to remain accessible to XHMTL pages while hiding as much of it as necessary from the browser.
The old Agents directory is still around; it really ought to be called ``Agents and Other Applications'', but the old name just sounds better, and besides the line between agents and other applications is pretty fuzzy. We could have renamed it ``Apps'' but that would have broken many things.
The user root directory ($HOME/.pia on a Unix system) is no longer called ``/~/'', but it's still around. It contains a sort of ``shadow tree'' that parallels the PIA's main directory tree rooted at PIA. Shadow directories and files are created on demand; all writes take place in the user directory tree. This provides an explicit mechanism that supports putting the PIA distribution tree on a read-only medium such as a CD-ROM.
The old way of referring to the History agent's data directory was /~/History; it was located in .pia/History. User-modified forms for the History agent were located, confusingly, in .pia/Agents/Proxie/History. Now, the History agent's data directory is referred to as /~History/DATA and is located in .pia/Agents/Proxie/History/DATA, where one would expect to find it.
The following shell commands can be used to move your history on a Unix machine:
```
      cd ~/.pia
      mkdir Agents/Proxie/History
      mv History Agents/Proxie/History/DATA
```
A more elaborate script can be found in PIA/src/app/tools/move-history-data in case you start using the new PIA before realizing that your history data ought to have been moved first.
An arbitrary amount of XML metadata can be associated with resources (files or directories), using per-directory XML configuration files with the default name of _subsite.xcf. This will eventually be used to support WebDAV ``properties''and similar things. Naturally, agents are already included in this metadata.
The entire PIA configuration can be specified in a single ``site configuration file'', which can be located anywhere. Multiple views of the same directory tree are possible, and several examples are provided in the Config/Site directory.
An Agent is no longer a Namespace -- the implementation is much simpler now. The old &AGENT:; construct is still around, but refers to the agent's attribute list. Most things that used to be entities in the agent's namespace are now files (still capable of being referred to as entities) in the agent's home directory.
The PIA's command line has been substantially overhauled; it now takes a pathname, which can be either the root directory or the top-level configuration file. Everything else can be configured in the top-level configuration file, although it is still possible to override the more common items (e.g. port) on the command line.

The net effect is that the PIA's core engine has undergone a tectonic shift from being a collection of agents that somehow managed to function as a web server, to being a web server that does a pretty good job of playing host to a collection of agents. Removing the tight coupling between URL-space naming and agents has improved both the server and the agents.

Known Problems

Because of the extent of the changes in this release, many things are still broken. If you have been relying on a PIA application to get your other work done, please save a copy of your current working directory and put it someplace safe!

Several old agents have not even been looked at, especially BugReport, View, and Proxie/fileTools. The non-Agent applications Tutorial and Demo are working, blissfully unaware of the changes that whirl around them.
Lots of pages in agents that are working (after a fashion) are still full of broken links and obsolete code. Many pages. These include Admin, Proxie, Proxie's sub-agents.
The ``help'' pages are almost totally broken at this point.
Much of the documentation needs to be rewritten, including large parts of
This is in progress.
There is no good way to list or access files in directories under a specific (real or virtual) root; they are all merged at the moment.
Agent initialization is somewhat broken -- it happens when you first enter a directory. It really needs to be done recursively up front.

Release 2.0.6

This release includes a slightly-revised language that is more readable and totally XML-compliant. In addition, several of the interfaces have been cleaned up, and extended to the point where the DPS can be dropped into any SAX or DOM application as a document-processing extension.

A significant upheaval in the way URL's are mapped onto files, and a total rethinking of the role of agents, is in progress. Preliminary notes and code exist (see item 1 below) but have not yet been integrated with the PIA as a whole.

Changes since 2.0.5:

Added new notes on naming, including design notes toward a radical change in the way URL's, files, and agents interrelate. Preliminary source code can be found in the org.risource.site package.
Added a sizeable number of new demonstrations, in the Tutorial agent.
Added methods to the org.risource.dps.Output interface that don't start with a DOM node (one is constructed if necessary). This makes it easy to drive an Output from, e.g., a SAX driver.
Added the org.risource.dps.output.ToProcessor class, an Output that drives a Processor. This makes it easy to drive the DPS from any event-driven parser, e.g. SAX, by writing a trivial adapter class.
Eliminated the need for entities as variables, clearing the way for use with SGML and XML parsers, most of which substitute entities ``up front''. Entities defined in a document's environment, e.g. PIA: and the tagset, will still work and will behave like ``traditional'' entities. For the moment, it also still works to access variables using entities.
Central to eliminating entities is a new tag, <element> (or, equivalently, <E>), for constructing an element and then processing it.
Most tags have been extended to take XML-style attributes as well as HTML-style ones (for example, <numeric op="add"> instead of <numeric add="add">).
The DPS (Document Processing System) can now be used with any DOM or SAX implementation.

Release 2.0.5

This release includes the ability to save and restore Agent state in XML files. At this point it is probably stable enough for someone willing to put up with the still-sketchy documentation to start building an application.

Changes since 2.0.4:

Release 2.0.5a includes more complete javadoc coverage and fewer javadoc errors.
The <AGENT> tag has been implemented in the PIA, allowing entire agents, and even groups of agents, to be saved in XML files and read back. This has the side-effects of both simplifying and speeding up start-up. It also ensures that agents are installed in a user-defined sequence.
The old-style agent checkpoint files, and the default Admin/START-UP.html, have been eliminated. If you have a customized START-UP.html file, however, it will run as usual. Browse to your new or customized agents' options pages, and use the Save Options form to write the XML data into files called AGENT.xml in the appropriate user directories. Then use the Save agent file list button on either /Admin/control or /Admin/load-agent to add the agent to the list of known agents. At some point there will be an option to save the file list automatically when appropriate.
There were two reasons for dropping the old checkpoint files, which were implemented as serialized Java objects. One is that they're not robust across changes in the classes. The other is that it turns out that reading and writing XML files is faster. Serialized tagsets (.tso files) have been dropped for the same reason.
The options form has been significantly enhanced, as has the /Admin/configure form. A new form, /Admin/control, has been added to perform immediate configuration changes.
The <namespace> and <bind> tags have been added; these allow arbitrary collections of name-value pairs to be saved in and restored from files.
The parser has been sped up by roughly a factor of two, mainly by going to a BufferedReader for input. Combined with XML agent loading and the ``stripped'' version of basic.ts (which is now built by default, and gives an additional factor of two), this makes initialization significantly faster.
The ubiquitous PIA logo in the page heading is now linked to "/" (your PIA's home page) instead of to www.RiSource.org as it used to be. Explicit links to www.RiSource.org are now provided in the heading and footer. Entities are used to ensure consistent use of fonts and colors for the logo.
Bugs in <logical and>, <text trim>, and <extract> have been fixed.

Release 2.0.4

This release marks some major improvements in DOM compatibility, plus a number of bug fixes.

Changes since 2.0.3:

The internal representation for parse trees in org.risource.dps is now fully compliant to the interfaces (though not yet all of the behavior) of the current version of the W3C's Document Object Model, as specified in REC-DOM-Level-1-19981001.
The local extensions to the DOM interfaces can be found in the org.risource.dps.active package; the implementation classes are in org.risource.dps.tree. A local copy of the DOM Java bindings can be found in org.w3c.dom; the old version of the DOM interfaces has been removed.
A number of bugs introduced in 2.0.3 by the new agent naming scheme have been fixed. Most of these had been manifested as broken links in agent code. Item 4. in release 2.0.3 (below) proved remarkably prescient.
A few minor improvements to the tag language have been made, most notably the <insert> sub-element of <extract>.
A bug in setting the content length header for proxied pages has been modified has been patched. Using the PIA as a proxy with the toolbars turned on should no longer cause problems with pages getting truncated.
The unix batch files in PIA/bin (pia, process, pia_wrapper) have been modified to work properly with or without having a classpath setting. (Without a classpath, it's best run with PIA/src/java as your current directory.) The Windows batch file PIA/bin/pia.bat (piajdk.bat no longer needed) no longer touches classpath, but changes directory to PIA/src/java and runs from there. (It may still be necessary to change the properties of your DOS box to enable more environment variables.)

Release 2.0.3

This is the second patch to Release 2.0; things are beginning to stabilize.

Changes since 2.0.2:

It is now possible to ``mount'' an Agent anywhere in the URL hierarchy.
An Agent's home, user, and data directories can be specified separately. Directories can be specified relative to any of several different roots.
The output of the DPS is now almost totally XML-compliant. The process -n command can be used to bring random files into compliance; this has been done with all .xh and .tsfiles in the PIA.
Most (we dare not hope for ``all'') of the broken links have been fixed.
Obsolete test code has been removed from org/risource/dps/test, org/risource/test/dom, ds, and pia subdirectories.
Pages have been added to the Demo agent to illustrate use of the debug and output handlers.
Precompiled zip and jar files have been removed from $PIA_HOME/lib. All classes are now accessed through the $PIA_HOME/src/java path. Also, due to Makefile changes, it is no longer necessary to add this path to your CLASSPATH.
A list of known bugs has been added to www.risource.org. It is accessed from the PIA page, known bugs link.
The PIA namespace has been made available so that the PIA properties listed in the pia.props file are now accessible to agents.

Release 2.0.2

This is the first patch to the first external open source release of the PIA. It is still intended primarily for developers. Several directories have been moved, and a few obsolete ones have been removed. This trend can be expected to continue for a while.

Major changes since 2.0.1A:

The Java directory structure has been revised to conform to current naming conventions. What was once src/java/crc is now src/java/org/risource. The RegEx package and the subset of the Jigsaw classes that we are actually using have been brought up to date and moved to their proper places in the package hierarchy.
CVS ID's had lost their all-important $Id:...$ due to an ill-advised ``export''. They're back.
HTML document types now refer to the correct public identifier,
"-//W3C//DTD HTML 4.0 Transitional//EN". It is possible (likely?) that some documents do not validate yet. The document types in .xh files are passed through the document processor unmodified, so they refer to the DTD of the output, not of the input file. This will change so that the document types are properly set by the processor (.xh files will have XML DTD's corresponding to the "tagsets" that they use. Output files will have their document types set appropriately -- usually HTML.) Also, we recommend that any .xh files you create adhere to the XML spec (quoted attribute values, include all end tags, etc.) Even though our processor will accept a wider range of (SGML) document syntax, using well formed XML will make it easier for you to process your .xh files with other tools should you choose to do so. We are in the process of converting all of our examples to strictly conform to the spec.
The Toolbar, History, and Cache agents, all of which operate by proxying through the PIA, have been gathered underneath the (new) ``Proxie'' agent. The agent namespace still isn't right, but this is at least useable. If you made yourself a custom START-UP.html file, make a new one. Proxie is poorly-documented at the moment. Your browsing history (in /~History) will be unaffected by these and future changes.
The obsolete directories Agents/Logo and Doc/Graphics have been removed.
Property names (stored in Config/pia.props now start with pia. instead of crc.pia. .
The interface org.risource.Version has been added to contain version-number constants. Running pia -version prints out the version string.

Release 2.0.1A

This is the first external open source release of the PIA. It is intended for developers working on the PIA internals. This is NOT a stable release -- the system does work, but many links are broken due to code changes made in preparing for the external release. Release 2.1 will be the first stable release. We suggest that persons who wish to use the PIA to develop applications but are not interested in debugging/programming the core engine wait for 2.1. Subscribe to PIA-announce@RiSource.org to be notified of new releases.

We recommend starting by installing and running your own copy of the PIA. (See the Installation manual for instructions.) Once the agency is running, you can view further documentation and try customizing the demonstration agents.

New Features

Document Processing System

The Document Processing System is a completely new rewrite of the PIA's extension language for active HTML and XML documents. The parser can handle both HTML and XML, and the internal data structures are compliant with the World Wide Web Consortium's Document Object Model standard.

Details can be found in the Internals Manual, and of course in the source code.

New Tag Language

The extension programming language used to write active documents (Interforms) has been almost completely revised, and considerably simplified. Details can be found in the Pia Tagsets Manual..

Note:: As of this release, some of the new tags are still unimplemented. A list of unimplemented tags, along with some known bugs, and desirable enhancements can be found on the to-do list.

You will notice that are classes/package names are prefixed with crc. This stands for California Research Center. We are considering changing the prefix to org.risource.

Miscellaneous items

In preparing the code for external release, several obsolete packages were removed. Some of these packages are referenced in the test code and may lead to error messages when doing a full make. You can safely ignore these messages.

Small pieces of a few external packages are used in places by the PIA software (Jigsaw,Jcrypt, regexp). Thanks to Anselm, John Dumans, Shugo Maeda, and the others who have contributed their software to the community. These are provided as libraries in PIA/lib/java. The source for the regexp package can be retrieved from ftp://ftp.risource.org/pub/regex-0_11.tgz

The current version of the DOM that we use is not the latest (we are hoping to upgrade soon.) The interface classes provided by the W3C for the version we are using is PIA/src/java/w3c. The latest version (which is incompatible with our current implementation) is available from the W3C.

Additional Information

The latest information regarding PIA technology can be found PIA web site: http//www.RiSource.org/PIA All feedback is greatly appreciated.

Thanks for your support,
The PIA Group (Greg, Steve, Marko, Pam)