For the previous (January-July 1999) version of this file, see Archive/naming.html. It's worth keeping around for the benefit of software paleontologists, and may be useful for identifying any fossilized remnants left in the documentation and code.
Up until now we have been regarding the PIA as a two-part system: a specialized web server based on ``agents'', and a document processing engine used to implement active documents. Agents, in turn, also had two aspects: the ability to ``act on'' selected server transactions, and the ability to ``satisfy'' or ``handle'' transactions to particular URL's.
All that is changed now. A new millenium is upon us.
And what rough beast, its hour come 'round at last,
Slouches toward Bethlehem to be born?The Second Coming by W. B. Yeats
The PIA should now be regarded as a three-part system.
A PIA server appears to the browser as a directory tree. The top level of this tree is called the PIA's ``site'', and any subtree of a site is called a ``subsite''.
The top level directory corresponding to a PIA site is called the
site's ``root''. To a first approximation, every top-level
subsite of the PIA corresponds to a subdirectory of the root, and so on
down the tree. Each subsite directory may contain a
_subsite.cfg
file that describes its configuration.
_subsite.cfg
.
It is, of course, be a configurable parameter, but we ought to make
a good choice for the default. An initial choice of
subsite.config
was rejected in part because Windows
systems have trouble with extensions longer than three characters. The
initial underscore was chosen for both DOS compatibility and to make
the name sort early in a listing. Hyphen or dollar sign might be
better.
In addition, the topmost _subsite.cfg
file may be superceded
by a Site Configuration File, which may come from anywhere. This allows
the same directory tree to be viewed as any of several different sites.
Many subsites in a working PIA are associated with agents; whether one
chooses to refer to the History
agent or the
/History
subsite depends a lot on which aspect you
want to talk about.
Similarly, many subsites contain both active documents (e.g. the forms
used to configure and control an agent or group of agents) and data files.
The convention is for the data files to be contained in a sub-subsite
called .../DATA/
.
Here is where the first-approximation equivalence between subsites and
subdirectories breaks down: it's possible for a subsite or a document to
be ``virtual'' -- located in another place in the filesystem. For
example, one might want to make a photo album accessible by anonymous FTP,
in which case one might make /home/ftp/pub
appear in the
site hierarchy as a virtual subsite of /Photo-album
with the
URL /Photo-album/DATA
or /Photo-album/Photos
.
The PIA starts up and looks for a suitable configuration file, using the following search list:
pia-site.xcf
or .pia-site.xcf
in
the user's home directory.
_subsite.xcf
in either $PIA_ROOT or
$PIA_HOME. The latter is guaranteed to exist.
The default setup makes $PIA_HOME the site's virtual root, and $PIA_ROOT the site's real root. Several alternative configuration files will be provided in the PIA/Config/Site directory.
Once the PIA's root directory has been located, all of its active subsites that contain agents can be initialized.
At some point we may want to ship a pre-configured real root,
e.g. PIA/MySite
, for the user to explore. It causes trouble
in a public installation, but may be simpler and easier to understand in
the more common case of a single-user installation.
It is possible that PIA/MySite
might not exist, either
because it hasn't been installed (it is generated, after all) or because
it has been removed. It would be possible in that case to construct a
virtual root that points to the default directory; this effectively
duplicates the process by which MySite
is created. However,
it will almost certainly be better to treat this as an error, just as in
any other case where the specified root does not exist.
One way to create a site, of course, would be to blindly copy
PIA/MySite
out of the installation. This sidesteps the
question of how MySite
gets created in the first place.
The way we create any subsite directory is to start with a virtual subsite (or, equivalently, a reference to a subsite directory) and realize it. This process consists of the following steps:
_subsite.cfg
file that
designates the new real directory as its location.
README.html
file containing the date, user's
name, and the pathname of the original directory.
One can go on to recursively realize more documents and subsites at this point.
The set of documents and subsites bound together at a subsite is called a ``sheaf'' of documents. A sheaf is distinguished from a directory in that it may contain virtual and imaginary entries as well as real ones. Note that a symbolic link doesn't quite capture the idea of a virtual document.
We need to take more than a cursory look at WebDAV, since the idea of XML-based metadata and file manipulation is very much the same idea as what we're considering here. WebDAV refers to a resource that contains the URI's of ``member'' resources as a ``collection'', and the metadata associated with a resource as ``properties''.
A directory's _subsite.cfg
file contains the following:
..:foo
can be used to navigate the tree.
It's important to note that if any of the variables associated with a
virtual subsite is changed, the subsite (and all of its parents) must be
realized at that point in order to have a suitable real directory in
which to store the modified _subsite.cfg
file.
Note that information about a subsite is actually contained in three places that have to be merged:
_subsite.cfg
file
_subsite.cfg
file
There are some problems with setting variables: it's not clear exactly
when, or even whether, to synchronize an internal cache with the
_subsite.cfg
file that contains a subsite's permanent
bindings. Obviously it would be a lot less confusing for users if this
happened invisibly. It may be useful to designate some variables as
volatile.
Subsites and documents are distinguished along three axes:
There are three different kinds of subsite:
_subsite.cfg
file, an XML file that describes the
contents of the subsite and how they are to be processed. In
particular, it defines the mapping between filenames and tagsets.
_subsite.cfg
file, but it is not a descendent of the root. Instead, it is ``bound
in'' or ``mounted'' by means of an entry in its parent's
_subsite.cfg
file. A Virtual subsite can be effectively
``realized'' by simply copying it into its real parent.
Similarly, a single subsite may consist of a combination of real, virtual, and imaginary documents. Note that even a real subsite may contain virtual files, for example files taken from a directory full of defaults. Such virtual documents and subsites can be specified either individually by name, or collectively by means of a search path.
For example, the directory listing of a file with no
index.html
file is an imaginary file, as is the error
document returned by a web server when a non-existent file is requested.
Another way of distinguishing subsites, completely orthogonal to the reality axis, is the activity axis.
_subsite.cfg
file, as may any
active descendent.
_subsite.cfg
file they or any of
their descendents may contain is ignored, and only ``safe'' tagsets may
be used to process files in them. Data directories, for example, are
almost invariably passive; otherwise someone who knew the PIA's naming
conventions might be able to deposit an XML virus in a cache (for
example) and have it executed.
(At some point we may want to distinguish two levels of passivity: processed only by safe tagsets, and totally unprocessed. We may also end up with two levels of activity: requiring agent initialization, and normal.)
Documents are also distinguished by activity:
Finally, documents and even subsites may be distinguished by whether they
are visible or invisible from a browser. Note that it is perfectly
meaningful for a subsite (even a virtual one) to be invisible -- this just
means that it can be accessed from an active document (e.g. as an external
entity or through an <include>
tag) but not by the
user.
The relationship between agents and directories will become simpler but
possibly less direct. Every agent will have a single ``home directory'',
but it may be possible for a directory to serve as the home of more than
one agent. Agents, in this case, would look like files under a directory,
rather than like directories. For example, the history agent might be
something like /History/History.agent
; the toolbar agents
might be *.agent
under /Toolbar
.
Moreover, it is no longer necessary for an agent to be associated with the Resolver; an agent is simply a piece of XML code that is executed in response to something other than a direct request from the server. In particular, Cron can be an agent. With this approach, agents become what everyone else thinks of as agents: autonomous pieces of code.
It will be possible to access agents by name by way of a namespace;
e.g. AGENTS:History
. Note, however, that individual agents
will not be mounted in URL space the way they are now -- there will just
be a collection of pages that ``know about'' a particular agent or group
of agents; the agents, in turn, will ``know about'' their home
directories.
Agents will correspond rather directly to the XML
(.agent
) document that spawn them: it will be
possible (easy) for an agent to synchronize its state with its home file.
Similarly, inside an agent's code, presumably ~
will refer to
the agent's home subsite (directory).
In most cases it will be useful to map the URL /~foo
into the
home subsite of agent foo
; this will be an option, however.
Note that the mapping will be to the agent's home, not the agent's
internal namespace -- it will be up to the pages in the directory to
know about their corresponding agent(s).
In any case the whole idea of a DOFS agent simply goes away -- many of
the entries at the top level of the URL tree will just be passive virtual
subsites (for example, /Doc
, /Icon
, and
/PIA
).
There are actually three alternatives for associating agents with subsites:
/Toolbars/~remote
; it takes its default forms from, e.g.,
/Toolbars/AgentDefaults
. The ``history agent'' would
correspond directly to a subsite called ~History
.
~remote
is just a shorthand notation for the home
subsite of the remote
agent, and might correspond to,
for example, /Toolbar
. Forms in Toolbar
would have to look at the AGENT:
namespace to see how they
were invoked. /History
and /~History
would
be synonyms.
~remote
is shorthand for the home subsite of the
remote
agent, but we require a subsite to have at most a
single visible agent associated with it.
The first scheme might have problems in some operating systems, if tildes are treated specially. It also makes a firm -- perhaps too firm -- distinction between agent subsites and ordinary subsites, and makes it impossible (by construction) to support multiple agents out of a single subsite (though they can be accomodated by means of virtual subsites).
On the other hand the second scheme, while reminiscent of Apache's virtual directory scheme for users, may prove to be too confusing or complex for application developers. It also means that forms in subsites that correspond to single agents either behave differently (by knowing their agent) from forms in subsites with multiple agents, or else behave differently depending on how we name them.
The third scheme sounds (superficially) like it may be a good compromise. It avoids both the naming ugliness of the first scheme and the ambiguities of the second. It still permits invisible ``helper'' agents, though these may be required to have virtual subsites as their homes.
A possibly-significant advantage of the second scheme is that it makes it easy to restrict agents' write access to their own home subsites, without relying on complicated systems of virtual links or more complex permission schemes.