About the Site Resource Package

Introduction

The Site Resource Package implements the mapping from URL's passed to the PIA server, to documents returned by the browser. This breaks down into the following steps:

Mapping from the file part of the URL to the corresponding document resource in the server's filesystem.
Mapping the resource into a content type, processing method (none, DPS, or CGI), and (in the case of DPS processing) tagset.

Additionally, the Site Resource Package is able to manage an arbitrary amount of additional XML metadata associated with documents, including WebDAV properties, DPS entities, PIA agents, and so on.

The information that defines the structure (configuration) of a site is also defined using XML, but configuration information is kept (at least conceptually) separate from other metadata in order to simplify the implementation and increase versatility. The XML used for configuration files is similar to the W3C's Resource Description Format and the IETF's WebDAV properties, and is perhaps a little closer to the former.

Class Structure

There are four main interfaces in the org.risource.site package:

Resource: the generic interface for any kind of resource on a site.
Document: the interface for document (leaf) resources.
Root: the interface for the root of a site.
Realizable: the interface for resources that can be copied from a ``virtual'' location to a ``real'' one.

The interfaces follow the the Composite pattern as described in the Gang-of-Four book; in this pattern the main interface (Resource) includes all methods needed for accessing sub-Resources. This imposes no burden on implementations of Document, which are free to return null results.

However, since every container (directory) Resource has an associated Document (for example, home.xh or index.html) that may be accessible on its own, it makes sense for Resource and Document to be different. A method, getDocument, gets the Document associated with a Resource (even if it is the same object).

There are two parallel sets of implementation classes:

AbstractResource

ConfiguredResource

Resource Subsite (container) FileResource

Document SiteDocument FileDocument

Root Site (FileRoot)

	AbstractResource
	ConfiguredResource
Resource	Subsite (container)	FileResource
Document	SiteDocument	FileDocument
Root	Site	(FileRoot)

The ``File'' classes are (comparatively) lightweight objects that contain no configuration information or associated XML metadata -- everything is derived from the underlying file or directory. (FileRoot is shown in parentheses because it is not presently implemented, but we will need it eventually.)

One may well ask why the interface is called Resource and the implementation is called Subsite rather than the other way around, or perhaps something like BasicResource. The main reason is that ``site.Resource'' simply sounds better than ``site.Subsite''. Also, ``resource'' (and to a lesser extent, ``document'') match the terminology used in, for example, WebDAV and most other web-related specifications. (URL, after all, stands for ``Universal Resource Locator''.)

Subsite caches a large amount of information: virtual search path for defaults, which virtual directory each child is in, timestamps, tagsets, configuration information for child documents, and so on. In fact, it is possible to build an entire virtual Site out of nothing but a configuration file.

For this reason, Subsite objects are normally kept in memory as a tree. FileResource objects are not, since they are easy to reconstruct from the available filesystem information. Similarly, FileDocument objects are easily reconstructed from a combination of filesystem information and the configuration information cached by their parent Subsite.

Location

A Resource normally has a ``real'' location in the filesystem, which is a direct descendent of the directory that corresponds to the Root resource. A container resource may also have a ``virtual search path'' of directories in which to look for default children. All writing is done in the real location.

In most cases the real location of a resource will not exist at first; in that case the resource has to be ``realized'' in order to write the resource.

Typically the virtual search path of a resource has only one or two elements: a ``prototype'' directory under the source-controlled PIA directory, and possibly a ``defaults'' directory that provides a fallback for documents like home.xh which most directories are expected to have. The prototype directory corresponds roughly to PIA/Agents, and most or all actual agents will have their prototype directories in PIA/Agents. The prototype for the standard, out-of-the-box configuration is PIA itself.

The real location of the PIA's root corresponds rather closely to the current .pia directory. It is created in the first place by specifying a ``configuration document'' for the Site (see below) and then ``realizing'' it. A command-line utility will be provided for this purpose.

There will be multiple sample configuration files in the standard distribution, corresponding to, e.g., an appliance server, a personal proxy, and so on. A distribution of the PIA could ship with a real, non-CVS-controlled Site directory created by realizing a default configuration as part of the release process. It might be best if this were a sibling of the PIA directory rather than a child; another possibility is to create it on installation (which would allow the user to select their preferred configuration). Most Unix users will, of course, want to use ~/.pia as the real location of their personal PIA.

Configuration

A Resource's ``configuration'' is specified using an XML element with node type ``Resource''. Attributes specify all of the String, boolean, and integer fields of the underlying object (of class Subsite or Document. XML metadata is contained in namespace elements in the content.

The configuration of a Container resource may also contain Resource sub-elements in its content, corresponding to documents and virtual containers that have no corresponding configuration file. A Subsite will normally have its configuration loaded from a file called, by default, ``_subsite.xcf''.

The configuration file of the Root resource may be specified separately; if such a configuration file is provided the _subsite.xcf file in the Root directory is ignored. Alternative configuration files for the PIA are provided in the PIA/Config/Site directory.

Note:: Although hooks are in place for modifying and saving XML configuration information, this feature is presently untested.
Implementation note:: It is possible that the right way to save configuration information is to save, not the _subsite.xcf file itself, but a copy of the ``property Namespace'' that is derived from it. On initialization we first load the _subsite.xcf file, then override properties from the property file to restore any changes.

Agents

One objective of the site package is to provide the machinery necessary to support ``agents,'' but without placing any constraints on their implementation. All that the Root needs to do is to map names that start with a ``~'' (tilde) character into the ``home Resource'' (typically a container) for the named agent. It is then up to the documents in that Resource to provide the agent's user interface.

Note that not all agents need to be registered in this way, only the ones that need web-accessible user interfaces. Similarly, nothing prevents a Resource from being the home of several agents, as long as some mechanism exists for sorting them out. One way of doing this might be to make an Agent's ``home Resource'' a document rather than a container, but this may complicate things unnecessarily. For the moment we can ignore the problem, and simply make sure that every registered agent has its own home.

In the new PIA, then, agents will be considerably simpler than in the old scheme, because they will no longer have anything to do with interpreting URL's or processing documents. Essentially, an agent will be nothing but an XML Element with an action sub-element in its content that provides the hook. In most cases an agent's definition is simply a sub-element of its home Subsite's configuration.

Note that in this scheme, an agent no longer needs a state document! All of its state is contained in ordinary documents in its home Subsite, which can be accessed in the usual way via entities or <include> tags.

Index Files and Directory Listings

Normally the document associated with a container resource is its home file (with any of several extensions taken from a standard list). If no home file exists, an index is searched for. Finally, if none of those exist, a standardized listing is created (using the Listing class, which implements the Document interface).

The standardized listing is always available (unless hidden) under the name ``.'', which is the Unix shorthand for the current directory. The period is only recognized as a listing file when it is the last filename in a path; otherwise it simply refers to the current Container, so that ``/./'' is equivalent to ``/'', as Unix users expect. (This feature can be very useful when constructing paths automatically, for example in a Makefile.

This convention replaces the PIA's previous mechanism, which involved distinguishing between paths ending and not ending with ``/''. This idea came from the original CERN httpd, but since no other servers picked it up it proved quite confusing for users.