On Maintaining a Website with the PIA

This document contains notes on maintaining a website using the PIA (and especially the process command) to process documents offline.

There are few, if any, ISP's that will let you run a PIA on their servers. And even if you have a direct connection to the Internet with a static IP address and a fast pipe, it's probably a bad idea to run the PIA as your primary web server. (When we get a version of the PIA that runs efficiently as an Apache module, this will change...)

So, for a variety of reasons, you are probably going to want to run the PIA's document processor (the process command) ``offline'' to generate ordinary, static HTML files which you can then upload to your external web site. This gives you all the versatility of the PIA with the security and efficiency of a ``traditional'' web server.

This document explains how to do this, in two sections:

  1. Offline Document Processing
  2. Uploading Documents

It is also possible to run a PIA in parallel with a traditional web server, letting the traditional server serve static pages, while the PIA serves the dynamic ones. This can be done in Apache, for example, by means of the ProxyPass option. The following line added to Apache's configuration file is all it takes:

Proxypass /PIA http://localhost:8888/

Offline Document Processing

The Basic Idea

By far the simplest way to do offline document processing is to use the make utility to process only the documents that you have changed. Assuming that you have a single directory full of .xh files and want to make .html files out of them, the following is a simple Makefile that will do the job.

### Files:
###	These macros can be used in conjunction with the 
###	.xh.html rule to generate HTML files offline.

XH_FILES= $(wildcard *.xh)
XH_HTML= $(XH_FILES:.xh=.html)

### Commands:

ifndef PIA_HOME
  PROCESS = process
else
  PROCESS = $(PIA_HOME)/bin/process
endif

### Tagsets:

ifndef TAGSET
  TAGSET=xhtml
endif

### Rules:

.SUFFIXES: .html .xh
.xh.html:
	$(PROCESS) -t $(TAGSET) $< > $@
	{ grep -s $@ .cvsignore } || echo $@ >> .cvsignore

### Targets:

all:: $(XH_HTML)

Now, the command ``make all'' will run the process command on all files that have changed since the last time you ran make.

Variations

With the PIA, it's possible to process a document through two different tagsets to get different results. I recently used this technique on a personal web site after a major re-organization: the <header> tag in the old directories created a ``moved'' sign in each page, with a link to the new location.

If you don't want separate .html and .xh files, it is possible to process HTML files ``in place.'' The best way is with a sequence of commands like:

  mv foo.html foo.html.bak
  process -t mytagset foo.html.bak > foo.html

This has the benefit of leaving you with a backup file in case something goes wrong. If you expect to do this more than once, it's worthwhile wrapping the sections you plan on updating with a <div> or <span> tag with an appropriate class or id attribute.

Uploading Documents

There are several different ways of uploading documents to a website. Unless your connection is very fast or your website is very small, you will want to upload only those documents which have changed since the last upload. There are three main ways of doing this:

  1. Use a utility that does the whole job. This technique is often called ``mirroring,'' since it produces a mirror image of your local working directory on the remote machine. If you have Linux or Unix on both the internal and external sites, by far the easiest method to use is the rsync program. You will still probably want to combine this with make for offline processing and other preparation.
  2. Use the CVS version-control system to check out the files on the remote machine, and do any necessary builds there. This only works, of course, if you have a shell account on the remote machine.
  3. Use make to identify the changed files, then upload them using ftp, rsync, scp, or some other program. Since you're probably going to use make anyway for offline document processing, this technique is particularly simple. It also gives you the widest choice of uploading utilities, and the finest-grained control over what gets uploaded.

Uploading with rsync

This is easy -- rsync is basically an improved version of the rcp remote copy program. You can mirror an entire directory tree with a command like:

rsync -a --numeric-ids --delete -v PIA/ cvs.risource.org:/home/cvsroot/PIA

(You may have guessed that we use this technique to upload the PIA's CVS repository -- this guarantees that the outside tree is an exact copy of the inside one used by the developers.)

The --delete option means to delete files on the remote system that have been deleted on the local system; the trailing slash on the source directory transfers its entire contents, recursively. It's a little tricky: the following command:

rsync -a --numeric-ids --delete -v PIA cvs.risource.org:/home/cvsroot

transfers the PIA subdirectory to the destination directory, and deletes everything else there! Usually not a good idea.

The -C option to rsync ignores the same files that CVS ignores for checkin; this is not usually what you want if your source files are under CVS control but the ones you want to upload are built by make.

You'll probably find it most convenient to put your rsync commands into your top-level Makefile -- that's what we do in the PIA and RiSource.org website.

Uploading with cvs

This leads inevitably to the next variation, uploading using cvs. This works very well if the following two conditions are true:

  1. You have a complete development environment on the host you're uploading to.
  2. You don't mind making your source files public.

In this case it's easy: use CVS to update the working directory on your remote host, and do a make to build everything that needs building.

We use a somewhat perverse variation on this technique for PIA releases: we actually synchronize the CVS trees using rsync, then do a complete cvs checkout into a brand-new directory, and re-build everything from scratch. Then we roll the whole thing up into a tar file and upload that to RiSource.org. You can see all of the gory details in the PIA's top-level Makefile.

Uploading with make

I've been using this technique for years on my public websites. It relies on the following fragment of Makefile code:

put:: $(FILES)
	@echo cd $(RMTDIR) 		 > put
	@echo binary 			>> put
	for f in $? ; do echo put $$f 	>> put ; done
	@echo bye 			>> put
	ftp -i $(HOST) < put > /dev/null

This is, of course, designed to use FTP, and it relies on having a .netrc file in your home directory with your password in it so that ftp doesn't have to ask. If you have to recursively descend into subdirectories you'll need to cd to the parent directory in order to do a mkdir on the (possibly new) child.

It's even easier with something like rcp or rsync:

put:: $(FILES)
	-rsh $(HOST) mkdir $(RMTDIR)
	rsync $? $(HOST):$(RMTDIR)
	touch put

Also note the use of rsh and mkdir to make the remote directory if it's not there already.

The basic idea is always the same, though: use a file (called put in this case) as a ``timestamp'' and upload everything that has changed since the last time.

If you have subdirectories (as most of us do), you'll want to make your Makefile recursive:

put::
	for p in $(SUBDIRS); do test -d $$p && \
            ( cd $$p; if test -f Makefile; \
	              then $(MAKE) put; fi ); \
	done

The various test's are there to ensure that you won't get an infinite recursion if one of the SUBDIRS or its Makefile is missing.


Copyright © 1997-1999 Ricoh Innovations, Inc.
$Id: websites.html,v 1.4 2001-01-11 23:36:52 steve Exp $
Stephen R. Savitzky <steve@rii.ricoh.com>