The Naming of Names (Again)

Yet Another Great Renaming

For the previous (January-July 1999) version of this file, see Archive/naming.html. It's worth keeping around for the benefit of software paleontologists, and may be useful for identifying any fossilized remnants left in the documentation and code.

Parts of the PIA

That Was The Web That Was

Up until now we have been regarding the PIA as a two-part system: a specialized web server based on ``agents'', and a document processing engine used to implement active documents. Agents, in turn, also had two aspects: the ability to ``act on'' selected server transactions, and the ability to ``satisfy'' or ``handle'' transactions to particular URL's.

All that is changed now. A new millenium is upon us.

And what rough beast, its hour come 'round at last,
Slouches toward Bethlehem to be born?

The Second Coming by W. B. Yeats

The New World Order

The PIA should now be regarded as a three-part system.

  1. As before, the mapping from SGML tags to corresponding actions is done by the document-processing engine, the DPS.
  2. Also as before, the mapping from HTTP transactions to Agents that act on them is done by the Resolver. The difference is that Agents no longer have anything to do with the mapping from URL's to documents or from (active) documents to tagsets.
  3. The third component of the PIA is Site Resource Package, which performs the mapping from URL's to documents. Since agents are no longer associated with this process, we were able to significantly simplify the Agent interface.

The PIA Site and its Sub-Sites

A PIA server appears to the browser as a directory tree. The top level of this tree is called the PIA's ``site'', and any subtree of a site is called a ``subsite''.

The top level directory corresponding to a PIA site is called the site's ``root''. To a first approximation, every top-level subsite of the PIA corresponds to a subdirectory of the root, and so on down the tree. Each subsite directory may contain a _subsite.cfg file that describes its configuration.

Note:
We may want to reconsider the name _subsite.cfg. It is, of course, be a configurable parameter, but we ought to make a good choice for the default. An initial choice of subsite.config was rejected in part because Windows systems have trouble with extensions longer than three characters. The initial underscore was chosen for both DOS compatibility and to make the name sort early in a listing. Hyphen or dollar sign might be better.

In addition, the topmost _subsite.cfg file may be superceded by a Site Configuration File, which may come from anywhere. This allows the same directory tree to be viewed as any of several different sites.

Many subsites in a working PIA are associated with agents; whether one chooses to refer to the History agent or the /History subsite depends a lot on which aspect you want to talk about.

Similarly, many subsites contain both active documents (e.g. the forms used to configure and control an agent or group of agents) and data files. The convention is for the data files to be contained in a sub-subsite called .../DATA/.

Here is where the first-approximation equivalence between subsites and subdirectories breaks down: it's possible for a subsite or a document to be ``virtual'' -- located in another place in the filesystem. For example, one might want to make a photo album accessible by anonymous FTP, in which case one might make /home/ftp/pub appear in the site hierarchy as a virtual subsite of /Photo-album with the URL /Photo-album/DATA or /Photo-album/Photos.

PIA Startup

The PIA starts up and looks for a suitable configuration file, using the following search list:

  1. The command line, which may specify either the pathname of the configuration file itself, or the site root directory and other information that might be in the configuration file.
  2. A file called pia-site.xcf or .pia-site.xcf in the user's home directory.
  3. A file called _subsite.xcf in either $PIA_ROOT or $PIA_HOME. The latter is guaranteed to exist.

The default setup makes $PIA_HOME the site's virtual root, and $PIA_ROOT the site's real root. Several alternative configuration files will be provided in the PIA/Config/Site directory.

Once the PIA's root directory has been located, all of its active subsites that contain agents can be initialized.

At some point we may want to ship a pre-configured real root, e.g. PIA/MySite, for the user to explore. It causes trouble in a public installation, but may be simpler and easier to understand in the more common case of a single-user installation.

It is possible that PIA/MySite might not exist, either because it hasn't been installed (it is generated, after all) or because it has been removed. It would be possible in that case to construct a virtual root that points to the default directory; this effectively duplicates the process by which MySite is created. However, it will almost certainly be better to treat this as an error, just as in any other case where the specified root does not exist.

Site Creation

One way to create a site, of course, would be to blindly copy PIA/MySite out of the installation. This sidesteps the question of how MySite gets created in the first place.

The way we create any subsite directory is to start with a virtual subsite (or, equivalently, a reference to a subsite directory) and realize it. This process consists of the following steps:

  1. Create the real directory
  2. Create a modified _subsite.cfg file that designates the new real directory as its location.
  3. Create a new README.html file containing the date, user's name, and the pathname of the original directory.

One can go on to recursively realize more documents and subsites at this point.


Under construction past this point.

Documents and Sheaves

The set of documents and subsites bound together at a subsite is called a ``sheaf'' of documents. A sheaf is distinguished from a directory in that it may contain virtual and imaginary entries as well as real ones. Note that a symbolic link doesn't quite capture the idea of a virtual document.

We need to take more than a cursory look at WebDAV, since the idea of XML-based metadata and file manipulation is very much the same idea as what we're considering here. WebDAV refers to a resource that contains the URI's of ``member'' resources as a ``collection'', and the metadata associated with a resource as ``properties''.

A directory's _subsite.cfg file contains the following:

  1. Normal variables. These correspond pretty much to the current Agent state, and are inherited by (i.e. accessible to) all active documents underneath the subsite. These include things like the extension mapping. Normal namespace scoping provides inheritance; constructs like ..:foo can be used to navigate the tree.
  2. Document descriptors. Strictly speaking these are contained in a local variable (perhaps called SHEAF); they describe any virtual or imaginary documents and subsites. Note that real documents and subsites do not need to be described explicitly unless they have non-default attributes.

It's important to note that if any of the variables associated with a virtual subsite is changed, the subsite (and all of its parents) must be realized at that point in order to have a suitable real directory in which to store the modified _subsite.cfg file.

Note that information about a subsite is actually contained in three places that have to be merged:

  1. its _subsite.cfg file
  2. its parent's _subsite.cfg file
  3. its parent's directory.

There are some problems with setting variables: it's not clear exactly when, or even whether, to synchronize an internal cache with the _subsite.cfg file that contains a subsite's permanent bindings. Obviously it would be a lot less confusing for users if this happened invisibly. It may be useful to designate some variables as volatile.

Subsite and Document Types

Subsites and documents are distinguished along three axes:

  1. Reality -- the extent to which they correspond to real subdirectories or real files descended from the root.
  2. Activity -- the extent to which they are processed by tagsets, and the extent to which these tagsets can influence the state and behavior of the PIA.
  3. Visibility -- the extent to which they are visible outside the PIA.

Real, Virtual, and Imaginary

There are three different kinds of subsite:

  1. Real subsites.
    A real subsite corresponds to an actual subdirectory descending from the site's root directory. A real subsite may contain a _subsite.cfg file, an XML file that describes the contents of the subsite and how they are to be processed. In particular, it defines the mapping between filenames and tagsets.
  2. Virtual subsites.
    A virtual subsite is something like a symbolic link: it consists of a real directory, containing a _subsite.cfg file, but it is not a descendent of the root. Instead, it is ``bound in'' or ``mounted'' by means of an entry in its parent's _subsite.cfg file. A Virtual subsite can be effectively ``realized'' by simply copying it into its real parent.
  3. Imaginary subsites
    An imaginary subsite is completely ``unreal'' -- it does not correspond to a real directory anywhere, but is constructed by a program out of thin air. This is somewhat like the "/proc" filesystem on Linux.

Similarly, a single subsite may consist of a combination of real, virtual, and imaginary documents. Note that even a real subsite may contain virtual files, for example files taken from a directory full of defaults. Such virtual documents and subsites can be specified either individually by name, or collectively by means of a search path.

For example, the directory listing of a file with no index.html file is an imaginary file, as is the error document returned by a web server when a non-existent file is requested.

Active and Passive

Another way of distinguishing subsites, completely orthogonal to the reality axis, is the activity axis.

  1. Active subsites
    May contain their own _subsite.cfg file, as may any active descendent.
  2. Passive subsites
    are ``locked'' -- any _subsite.cfg file they or any of their descendents may contain is ignored, and only ``safe'' tagsets may be used to process files in them. Data directories, for example, are almost invariably passive; otherwise someone who knew the PIA's naming conventions might be able to deposit an XML virus in a cache (for example) and have it executed.

(At some point we may want to distinguish two levels of passivity: processed only by safe tagsets, and totally unprocessed. We may also end up with two levels of activity: requiring agent initialization, and normal.)

Documents are also distinguished by activity:

Visible and Invisible

Finally, documents and even subsites may be distinguished by whether they are visible or invisible from a browser. Note that it is perfectly meaningful for a subsite (even a virtual one) to be invisible -- this just means that it can be accessed from an active document (e.g. as an external entity or through an <include> tag) but not by the user.

Agents

The relationship between agents and directories will become simpler but possibly less direct. Every agent will have a single ``home directory'', but it may be possible for a directory to serve as the home of more than one agent. Agents, in this case, would look like files under a directory, rather than like directories. For example, the history agent might be something like /History/History.agent; the toolbar agents might be *.agent under /Toolbar.

Moreover, it is no longer necessary for an agent to be associated with the Resolver; an agent is simply a piece of XML code that is executed in response to something other than a direct request from the server. In particular, Cron can be an agent. With this approach, agents become what everyone else thinks of as agents: autonomous pieces of code.

It will be possible to access agents by name by way of a namespace; e.g. AGENTS:History. Note, however, that individual agents will not be mounted in URL space the way they are now -- there will just be a collection of pages that ``know about'' a particular agent or group of agents; the agents, in turn, will ``know about'' their home directories.

Agents will correspond rather directly to the XML (.agent) document that spawn them: it will be possible (easy) for an agent to synchronize its state with its home file. Similarly, inside an agent's code, presumably ~ will refer to the agent's home subsite (directory).

In most cases it will be useful to map the URL /~foo into the home subsite of agent foo; this will be an option, however. Note that the mapping will be to the agent's home, not the agent's internal namespace -- it will be up to the pages in the directory to know about their corresponding agent(s).

In any case the whole idea of a DOFS agent simply goes away -- many of the entries at the top level of the URL tree will just be passive virtual subsites (for example, /Doc, /Icon, and /PIA).

Naming Alternatives

There are actually three alternatives for associating agents with subsites:

  1. There really is a subsite called, e.g., /Toolbars/~remote; it takes its default forms from, e.g., /Toolbars/AgentDefaults. The ``history agent'' would correspond directly to a subsite called ~History.
  2. ~remote is just a shorthand notation for the home subsite of the remote agent, and might correspond to, for example, /Toolbar. Forms in Toolbar would have to look at the AGENT: namespace to see how they were invoked. /History and /~History would be synonyms.
  3. ~remote is shorthand for the home subsite of the remote agent, but we require a subsite to have at most a single visible agent associated with it.

The first scheme might have problems in some operating systems, if tildes are treated specially. It also makes a firm -- perhaps too firm -- distinction between agent subsites and ordinary subsites, and makes it impossible (by construction) to support multiple agents out of a single subsite (though they can be accomodated by means of virtual subsites).

On the other hand the second scheme, while reminiscent of Apache's virtual directory scheme for users, may prove to be too confusing or complex for application developers. It also means that forms in subsites that correspond to single agents either behave differently (by knowing their agent) from forms in subsites with multiple agents, or else behave differently depending on how we name them.

The third scheme sounds (superficially) like it may be a good compromise. It avoids both the naming ugliness of the first scheme and the ambiguities of the second. It still permits invisible ``helper'' agents, though these may be required to have virtual subsites as their homes.

A possibly-significant advantage of the second scheme is that it makes it easy to restrict agents' write access to their own home subsites, without relying on complicated systems of virtual links or more complex permission schemes.


Copyright © 1999 Ricoh Innovations, Inc.
$Id: naming.html,v 1.18 2001-01-11 23:36:50 steve Exp $
Stephen R. Savitzky <steve@rii.ricoh.com>