Iusmentis site management tool in PHP

In this document

The basics
The content
Useful functionality
Management scripts
Database design

This article explains the site management tool I use for the iusmentis.com website. It's completely written in PHP, uses the DOM XML API to parse documents, and relies on a MySQL database to manage metadata. A variety of helper functions is available to assist in rendering content, creating links and so on.

Download the source.

The basics

Strategy

The basic strategy is to convert every resource into one or more HTML documents. These are placed in a local directory, from which they can be mirrored to the actual website (using e.g. rsync, mirror or unison. Every directory holds exactly one HTML file (typically index.html or default.html), so that URLs don't get ugly extensions.

Files and directories

I've set up a home directory for the site, in which there are subdirectories content (for the source XML documents), www (for the local copy of the website) and common (for include files like the header and footer). The tool uses define() to set the destination directory and so on.

The content

Content for the site is stored in XML documents. They are then parsed into DOM objects and converted. Every XML document must have an ID and a LANG attribute on its root element, allowing the tool to retrieve the metadata from the database.

The three currently supported DocBook types are article, book and faq.

My installation of DOM XML expects input to be UTF-8, and objects to raw entities like © in the source.

Articles

Articles are converted into one-part documents, with a standard header and footer. Articles should have abstracts at the beginning, so that readers can easily find out what the articles are about.

If a section has an ID attribute, the value is transformed to an HTML fragment anchor (HREF="#idvalue") so that you can directly link to it. However, articles do not get a table of contents (you could manually add links in the abstract, though).

Books

Books are converted into multiple documents. Every chapter gets its own subdirectory under the book's directory. The chapter is then converted just like an article, but also gets a table of contents listing all the chapters at the bottom. A book must have an abstract. This is output in the book's directory just before the table of contents.

The chapters are added to the database, so you can link to them directly. The book's resourceid is prepended to the chapter's resource ID with a '@' in between.

Front-, back- and other matters that can be used in a DocBook book are currently ignored. Their contents will be rendered, however.

FAQs

Lists of Frequently Asked Questions (FAQs) are the third type of content that can be handled by the tool. An FAQ can be either one-part or multi-part. In the latter case, just like with books, every part gets its own subdirectory, with a list of questions at the top of the FAQ section. The table of contents is printed in the top directory for the FAQ, together with a full question list.

FAQ sections of a multi-part FAQ are indicated using the qandadiv element, which must have an ID attribute. Questions may have an ID attribute; those that don't will get one assigned that consists of the letters and digits of the question with all other characters removed.

FAQ sections are added to the database, just like chapters. However, because every question has an ID attribute (either assigned or present in the XML source) it is also possible to link directly to a question (hm, actually this isn't possible yet, but should be trivial - next version).

The script expects the root element to be called faq, even though DocBook calls it article with a class attribute of value FAQ.

Useful functionality

Linking to other resources

Using the ulink element you can insert standard links. The attribute url holds the URL to which a link should be added. Future versions will support indirect linking (via the database), so that all links are only stored once and can be checked more easily.

The link element is available to include more robust links. Currently only for local resources, but future versions will also allow links to external resources.

The linkend attribute is set to the resourceid of the linked resource. The tool resolves the link, and substitutes the correct URL. The title of the linked resource is added as TITLE attribute (HTML 3.2).

Most resources are available in multiple languages. This means that you can link to e.g. the ID "patents" in a Dutch or an English document without having to worry about what language you link to. The tool will determine the current document's language, and link to "/patents/" or "/octrooien/" as appropriate. However, if no same-language version is available, a language-specific warning is added after the link (something like "(in Dutch)").

DocBook also uses the endterm attribute. If set to "title", the title of the linked resource is substituted for the contents of the link element. This is also fully supported.

Patent references

Since iusmentis is a site about intellectual property, it should come as no surprise that there are many patent references. In the future, it will be possible to use link to link to patent databases given a patent number.

Management scripts

docbook.php

The main driver is docbook.php, to be called with a single argument with the full path of the XML document to be published. Note that PHP does a chdir() to the directory of the PHP file, causing relative paths to break (sigh).

The driver parses the document into a DOM object, reads out the metadata from the database (and prompts you if the resource isn't in the database yet), and calls the right document type-specific driver.

dbkarticle.php

Converts a DocBook article to HTML document.

dbkbook.php

Converts a DocBook book to a set of HTML documents.

dbkfaq.php

Converts a DocBook FAQ to one or more HTML documents.

database.php

Functions that retrieve, write and format metadata in the database.

sql.php

Wrappers for access to MySQL database (connecting to database, queries).

reporting.php

Printing messages to stderr (which by default isn't open in PHP - yikes) at varying degrees of verbosity.

sitemap.php

Builds a sitemap with all the resources in the database, in all the languages in which the root element is present.

xwc.php

Counts number of words in one or more XML documents.

Database design

The tool expects a MySQL database (well, an SQL database that supports SELECT, INSERT and UPDATE) with the following tables:

resources: metadata for all the resources
authors: metadata for authors who write resources
crossreferences: references from one resource to another
blurbs: introduction texts for hubs

Articles

resourceid: a 40-char unique identifier for the resource
lang: a 2-char language identifier (primary key together w/ resourceid)
title: title of the resource
shorttitle: short title of the resource (usually same as title)
authorid: link to authors table
copyrightyear: year in which resource was created
lastbuilt: timestamp that gets updated whenever resource is rebuilt
lastmod: timestamp that reflects last mod of source XML document
label: label to be used in URL to HTML version
parentid: reference to parent of resource
resourcetype: indicates type ('root','hub','content','sitemap','country')

Authors

The author's name, e-mail address and homepage are stored in the authors table. When retrieving resource information, the authorid attribute is cross-referenced against this table so that the page can embed author name and homepage in the footer.

Crossreferences

Indicates links from one resourceid to another, to be used when a hub is generated.

Blurbs

Contains DocBook abstracts for resourceids that are of type 'hub', to be used when a hub is generated.