This document contains notes on maintaining a website using the PIA (and
especially the process
command) to process documents offline.
There are few, if any, ISP's that will let you run a PIA on their servers. And even if you have a direct connection to the Internet with a static IP address and a fast pipe, it's probably a bad idea to run the PIA as your primary web server. (When we get a version of the PIA that runs efficiently as an Apache module, this will change...)
So, for a variety of reasons, you are probably going to want to run the
PIA's document processor (the process
command) ``offline'' to
generate ordinary, static HTML files which you can then upload to
your external web site. This gives you all the versatility of the PIA
with the security and efficiency of a ``traditional'' web server.
This document explains how to do this, in two sections:
It is also possible to run a PIA in parallel with a traditional web server, letting the traditional server serve static pages, while the PIA serves the dynamic ones. This can be done in Apache, for example, by means of the ProxyPass option. The following line added to Apache's configuration file is all it takes:
Proxypass /PIA http://localhost:8888/ |
By far the simplest way to do offline document processing is to use the
make
utility to process only the documents that you have
changed. Assuming that you have a single directory full of
.xh
files and want to make .html
files out of
them, the following is a simple Makefile
that will do the
job.
### Files: ### These macros can be used in conjunction with the ### .xh.html rule to generate HTML files offline. XH_FILES= $(wildcard *.xh) XH_HTML= $(XH_FILES:.xh=.html) ### Commands: ifndef PIA_HOME PROCESS = process else PROCESS = $(PIA_HOME)/bin/process endif ### Tagsets: ifndef TAGSET TAGSET=xhtml endif ### Rules: .SUFFIXES: .html .xh .xh.html: $(PROCESS) -t $(TAGSET) $< > $@ { grep -s $@ .cvsignore } || echo $@ >> .cvsignore ### Targets: all:: $(XH_HTML) |
Now, the command ``make all
'' will run the
process
command on all files that have changed since the last
time you ran make
.
With the PIA, it's possible to process a document through two different
tagsets to get different results. I recently used this technique on a
personal web site after a major re-organization: the
<header>
tag in the old directories created a ``moved''
sign in each page, with a link to the new location.
If you don't want separate .html
and .xh
files,
it is possible to process HTML files ``in place.'' The best way is with a
sequence of commands like:
mv foo.html foo.html.bak process -t mytagset foo.html.bak > foo.html
This has the benefit of leaving you with a backup file in case something
goes wrong. If you expect to do this more than once, it's worthwhile
wrapping the sections you plan on updating with a <div>
or <span>
tag with an appropriate class
or
id
attribute.
There are several different ways of uploading documents to a website. Unless your connection is very fast or your website is very small, you will want to upload only those documents which have changed since the last upload. There are three main ways of doing this:
rsync
program. You will still probably want to
combine this with make
for offline processing and other
preparation.
make
to identify the changed files, then upload them
using ftp
, rsync
, scp
, or some
other program. Since you're probably going to use make
anyway for offline document processing, this
technique is particularly simple. It also gives you the widest choice
of uploading utilities, and the finest-grained control over what gets
uploaded.
rsync
This is easy -- rsync
is basically an improved version of the
rcp
remote copy program. You can mirror an entire directory
tree with a command like:
rsync -a --numeric-ids --delete -v PIA/ cvs.risource.org:/home/cvsroot/PIA
(You may have guessed that we use this technique to upload the PIA's CVS repository -- this guarantees that the outside tree is an exact copy of the inside one used by the developers.)
The --delete
option means to delete files on the remote
system that have been deleted on the local system; the trailing slash on
the source directory transfers its entire contents, recursively. It's a
little tricky: the following command:
rsync -a --numeric-ids --delete -v PIA cvs.risource.org:/home/cvsroot
transfers the PIA subdirectory to the destination directory, and deletes everything else there! Usually not a good idea.
The -C
option to rsync
ignores the same files
that CVS ignores for checkin; this is not usually what you want if your
source files are under CVS control but the ones you want to upload are
built by make
.
You'll probably find it most convenient to put your rsync
commands into your top-level Makefile
-- that's what we do in
the PIA and RiSource.org website.
cvs
This leads inevitably to the next variation, uploading using
cvs
. This works very well if the following two
conditions are true:
In this case it's easy: use CVS to update the working directory on your
remote host, and do a make
to build everything that needs
building.
We use a somewhat perverse variation on this technique for PIA releases:
we actually synchronize the CVS trees using rsync
, then do a
complete cvs checkout
into a brand-new directory, and
re-build everything from scratch. Then we roll the whole thing up into a
tar
file and upload that to RiSource.org. You can see all of the gory
details in the PIA's top-level Makefile.
make
I've been using this technique for years on my public websites. It relies
on the following fragment of Makefile
code:
put:: $(FILES) @echo cd $(RMTDIR) > put @echo binary >> put for f in $? ; do echo put $$f >> put ; done @echo bye >> put ftp -i $(HOST) < put > /dev/null |
This is, of course, designed to use FTP, and it relies on having a
.netrc
file in your home directory with your password in it so
that ftp
doesn't have to ask. If you have to recursively
descend into subdirectories you'll need to cd
to the
parent directory in order to do a mkdir
on the
(possibly new) child.
It's even easier with something like rcp
or rsync
:
put:: $(FILES) -rsh $(HOST) mkdir $(RMTDIR) rsync $? $(HOST):$(RMTDIR) touch put |
Also note the use of rsh
and mkdir
to make the
remote directory if it's not there already.
The basic idea is always the same, though: use a file (called
put
in this case) as a ``timestamp'' and upload everything
that has changed since the last time.
If you have subdirectories (as most of us do), you'll want to make your Makefile recursive:
put:: for p in $(SUBDIRS); do test -d $$p && \ ( cd $$p; if test -f Makefile; \ then $(MAKE) put; fi ); \ done |
The various test
's are there to ensure that you won't get an
infinite recursion if one of the SUBDIRS
or its
Makefile
is missing.