tsdoc/basic.html
This document introduces the basic
tagset, the most fundamental of
several predefined tagsets available to users and developers working with the
PIA. basic
defines all of the elements that make documents in
the PIA ``active.''
xxml
xhtml
pia-xhtml
Some additional tagsets are defined for special-purpose applications. For
example, the tsdoc
tagset is used
to automatically generate documentation files from tagset files. The
basic
constructs are all that the average author will need in
most cases.
Currently the best source of examples of tag set use is the demo agent. If you wish to view these examples without using the PIA, look at the listing of demo files.
This document contains a quick reference table that
contains links to the automatically-generated documentation in tsdoc/basic.html
.
If you are unfamiliar with the documentation, you should first look at How to Read Tagset Descriptions.
The basic tagset consists of primitive tags that are predefined. These tags provide generally useful functions and can be used as is. Primitive tags can also be used as components of new tags that you create yourself.
Extending the primitive
tagset requires the use of a special tag, <define>
whose
purpose is to construct other tags. The <define>
tag
is used to specify the tag name and all other necessary information
about the tag including its attributes.
Tag is used here in much the same way that element might be used in an XML or SGML DTD. Defining a tag requires the same information one might supply for a DTD element. This includes the following:
XHTML tags are active tags in that they constitute not merely markup but a directive to take some action. and can require some additional information not required of XML/SGML element definitions. This information reflects the special uses to which these tags are put. This includes the tag's handler, a specification of the Java class that is associated with actions to be carried out when this element is present. The keyword handler can be followed with the name of the class to be used. If no class is named, the handler is assumed to have the same name as the tag.
When viewed as active tags, XHTML tags are much like Java or C++ methods. They have a name, a documentation field, optional arguments, and they perform some sort of action. A combination of HTML, XML, or other XTHML tags can be used in the action clause. The last item evaluated in the action clause is returned, so long as it has a return value.
Viewed in this light, a tag's attributes are analogous to the arguments that may be passed into a function.
A simple tag definition
for the user-defined tag my_tag
is presented here:
<define element=my_tag>
<doc> Given a string attribute, prints that string in a bold font.</doc><define attribute=my_attr required></define><action>
<b>&attributes:my_attr;</b>
</action>
</define>
The <doc>
element
serves to document the tag's actions. The <action>
element
specifies the action taken when this tag is evaluated. In this case
it prints the tag's attribute using a bold font. The following example
shows how, once defined, this tag might be used in an active document.
<my_tag
my_attr="Hello World"></my_tag>
<code>
tags with a language
attribute.<!doctype...>
declaration must be the highest level
(outermost) element in the document.The tagset categories and their elements described in this document are listed in the quick reference table that follows. Each element and attribute name in the table is linked to a tagset definition file for that element. If you are unfamiliar with these files, read the following section before linking to a definition.
A tagset definition file consists of a sequence of
<define>
elements. Some of these statements are nested
inside others. The outermost <define>
elements define SGML
elements and entities. Nested inside each element definition
are the definitions of its attributes. Any definition can also
contain documentation.
When converted to HTML for documentation purposes, the nested attribute
definitions are indented. Documentation elements are nested one more level and
typeset in italics. The </define>
, <doc>
,
and </doc>
tags are omitted.
Certain elements are only meaningful inside of other elements. For example,
<then>
elements only occur inside <if>
elements. By convention, the definitions of these elements follow that of
their parent in element.
The
construct specification elements are used to create a tagset. They
include the <define>
element
and its subelements.
Subelements of <define> |
---|
<value> |
<action> |
The <define>
element can be used to
specify any of the following tag types:
element
attribute
entity
word
The <define>
element
must be predefined for bootstrapping, but it is not in
the tagset unless placed there.
The tagset is not recursive. For that reason, tags cannot be used as actions.
The <define>
element
can occur outside of a <namespace>
or <tagset>
element because
there is always a "current" namespace and tagset in effect.
A <define>
element
that contains neither a <value>
nor an <action>
subelement defines
only syntax. The defined construct is simply passed through to the
output by the processor, with its contents and attributes, if any,
processed in turn.
A <define>
element can
contain anything at all. All content with the exception of the <value>
, <action>
,
and possibly <doc>
elements are discarded. This
means that a definition can contain arbitrary decorative markup,
and that arbitrary computation can be done in the course of processing
a definition.
A construct can be "defined" more than once. In such cases, the attributes are effectively merged. The associated value and/or action are replaced. This technique is used to associate a new value with a construct, and to associate an action with a construct that has already been defined.
The following attribute of the <define>
element
is used to specify the type of construct being defined and whether
it is required or optional. The name of the element defined is expressed
as the value of the element attribute.
The attributes
for <define element>
are of the form
<define
attribute='construct_type' optional>
They are summarized here and described in more detail below:
Construct Type |
---|
element |
attribute |
entity |
notation |
The following sections describe the available modifiers for the structure constructor elements.
In working with these modifiers consider the following:
<define>
element are meaningful only when
defining an element. It is impossible to represent this constraint
in SGML. parent=
in subelements of <define>
specifies
that these elements only occur inside the given parent element;
in this case, <define>
. The value of the parent
attribute
is a list which is appended to with each use, allowing the DTD to
be incrementally extended. parent
greatly
simplifies the construction of content models and the parser. An element
with a parent implicitly terminates any unclosed elements between
it and its innermost parent.The <tagset>
and <namespace>
elements
provide the context in which <define>
operates,
i.e., in which elements, entities, and so on are defined.
It
is, however, meaningful for <define>
to occur outside
of a <namespace>
or <tagset>
element
because there is always a "current" namespace and tagset in effect.
The <namespace>
element provides the context in which
<define entity>
, <get>
,
<set>
, <let>
and <bind>
operate, i.e., in which names are associated with values. It's best to
think of a Namespace as a collection of what most programming languages
call ``variables''.
It is, however, meaningful for <define>
, etc. to occur
outside of a <namespace>
element
because there is always a "current" namespace in effect. The outermost
namespace in a document is called (has the prefix) ``VAR:
'',
because it contains the document processor's variables.
Namespaces are ``nested'' -- if a name is not defined in the current
namespace, <get>
will look ``up the stack'' to find a
namespace that does contain it. Inner namespaces always have the
name of the element (tag) that defines them; the following elements define
namespaces:
<namespace>
<extract>
<repeat>
<define>
The difference between <let>
and <set>
is that if a variable already has a value, <set>
will
simply change its value no matter which namespace it's defined in. If no
such variable already exists, <set>
will create one in
the outermost namespace, VAR:
. On the other hand,
<let>
will always set a variable in the
innermost namespace, and will create a new one there if
necessary.
The <bind>
element is used almost exclusively for
initialization: its contents are not expanded and the name cannot contain
a namespace prefix. It always defines its variable in the innermost
namespace. Because it makes no attempt to expand its contents,
<bind>
is significantly more efficient when it can be
used. You will usually see it in XML code resulting from the
output
of a namespace; this allows namespaces and things that
resemble namespaces (e.g., Agents) to be read in efficiently.
The elements <doc>
and <note>
are subelements of <tagset>
and <namespace>
.
They are processed by the tsdoc
tagset to automatically construct
the text portion of tagset documentation files.
Control structure elements modify the
control flow of an expansion, by selectively including, skipping,
or repeating some content. The control structure elements are
<if>
and <repeat>
.
The control structure elements are summarized here:
<if>
and its ComponentsSubelement | Parent | Handler |
---|---|---|
<then> |
<if> <else-if> <elif> <elsf> |
quoted |
<else> |
<if> |
quoted |
<else-if> |
<if> |
elsf |
<elsf> |
<if> |
|
<elif> |
<if> |
elsf |
<repeat>
and
its ComponentsThe contents of a <repeat>
are
repeatedly expanded. All of the following subelements are effectively
iterating in parallel, which makes it easy to go through multiple
lists and number the corresponding elements.
Subelement | Attribute |
---|---|
<foreach> |
entity |
<for> |
entity |
start |
|
stop |
|
step |
|
<start> |
None |
<stop> |
None |
<step> |
None |
<while> |
None |
<until> |
None |
<first> |
None |
<finally> |
None |
The
logical elements are <logical>
and <test>
.
Element | Attribute |
<logical> |
op |
and |
or |
|
<test> |
text |
not |
|
zero |
|
positive |
|
negative |
|
numeric |
|
match |
|
exact |
|
case |
|
null |
Document structure elements extract nodes or sets of nodes from a parse tree, and perform structural modifications on trees. The tree being operated on need not be part of the document being processed. It might be a namespace or the value of an entity.
These elements consist
of the <extract>
element and its subelements.
Element | Attribute |
---|---|
<extract> |
sep |
all |
Subelement Type | Subelement | Attributes |
---|---|---|
Starting Point | <from> |
None |
<in> |
None | |
<id> |
case |
|
recursive |
||
all |
||
Extraction | <name> |
case |
recursive |
||
all |
||
<key> |
sep |
|
recursive |
||
all |
||
Replacement | <replace> |
name |
case |
||
<append> |
None | |
<remove> |
None | |
<unique> |
None |
The subelements of extract fall into three groups:
<extract>
:
Starting Points<extract>
: Extractiontext
can
occur inside a <extract>
element. Text is split
on whitespace and interpreted as follows: #
), it is
matched as a node type. The list of node types is defined in the XPointer
specification, plus locally-defined types. In addition, #all
is
defined, matching any node. Type matching is case-insensitive.... li -1
extracts the last <li>
element
in the current set.<extract>
: ReplacementExpansion control elements modify the processing of their contents, but are not conditional in the same way that control-structure operations are. No tests are performed.
Element | Attribute |
---|---|
<expand> |
hide |
<protect> |
result |
markup |
|
<hide> |
text |
<debug> |
None |
<show-errors> |
None |
<pretty> |
hide-above-depth |
hide-below-depth |
|
hide-below-tag |
|
white-tag |
|
yellow-tag |
Data manipulation elements perform operations on data, typically text, that depend on some non-structural features of its content (e.g. its value as a number).
Element | Attribute |
---|---|
<numeric> |
sum |
difference |
|
product |
|
quotient |
|
remainder |
|
power |
|
sort |
|
reverse |
|
pairs |
|
sep |
|
digits |
|
integer |
|
extended |
|
modulus |
|
<text> |
pad |
trim |
|
width |
|
align |
|
sort |
|
reverse |
|
pairs |
|
sep |
|
split |
|
join |
|
encode |
|
decode |
|
<subst> |
match |
result |
External "resources" include both documents local to the system on which the document processor resides (i.e. files), and remote resources (specified with complete URLs).
Element | Attribute |
---|---|
<include> |
src |
tagset |
|
entity |
|
quoted |
|
<output> |
dst |
append |
|
directory |
|
<connect> |
method |
src |
|
mode |
|
tagset |
|
entity |
|
result |
|
<status> |
src |
entity |
|
item |
Data
structure elements perform no operations. They represent common
forms of complex structured data. Strictly speaking,
and <tagset>
are
data structure elements. Often a data structure element has a representation
that is a subclass of the representation of an ordinary
element. (Currently <namespace>
org.risource.dps.active.ParseTreeElement
).
Element | Subelements | Attribute |
---|---|---|
<DOCUMENT> |
<protocol> |
None |
<version> |
||
<code> |
||
<message> |
||
<HEADERS> |
None | element |
name |
||
<Query> |
<Query> |
element |
<URL> |
None | protocol |
host |
||
port |
||
path |
||
reference |
||
query |