Iusmentis site management tool in PHP
This article explains the site management tool I use for the iusmentis.com website. It's completely written in PHP, uses the DOM XML API to parse documents, and relies on a MySQL database to manage metadata. A variety of helper functions is available to assist in rendering content, creating links and so on.
The basic strategy is to convert every resource into one or more HTML documents. These are placed in a local directory, from which they can be mirrored to the actual website (using e.g. rsync, mirror or unison. Every directory holds exactly one HTML file (typically index.html or default.html), so that URLs don't get ugly extensions.
Files and directories
I've set up a home directory for the site, in which there are
content (for the source XML
www (for the local copy of the
common (for include files like the
header and footer). The tool uses
set the destination directory and so on.
Content for the site is stored in XML documents. They are then parsed into DOM objects and converted. Every XML document must have an ID and a LANG attribute on its root element, allowing the tool to retrieve the metadata from the database.
The three currently supported DocBook types are article, book and faq.
My installation of DOM XML expects input to be UTF-8, and objects to raw entities like © in the source.
Articles are converted into one-part documents, with a standard header and footer. Articles should have abstracts at the beginning, so that readers can easily find out what the articles are about.
If a section has an ID attribute, the value is transformed to an HTML fragment anchor (HREF="#idvalue") so that you can directly link to it. However, articles do not get a table of contents (you could manually add links in the abstract, though).
Books are converted into multiple documents. Every chapter gets its own subdirectory under the book's directory. The chapter is then converted just like an article, but also gets a table of contents listing all the chapters at the bottom. A book must have an abstract. This is output in the book's directory just before the table of contents.
The chapters are added to the database, so you can link to them directly. The book's resourceid is prepended to the chapter's resource ID with a '@' in between.
Front-, back- and other matters that can be used in a DocBook book are currently ignored. Their contents will be rendered, however.
Lists of Frequently Asked Questions (FAQs) are the third type of content that can be handled by the tool. An FAQ can be either one-part or multi-part. In the latter case, just like with books, every part gets its own subdirectory, with a list of questions at the top of the FAQ section. The table of contents is printed in the top directory for the FAQ, together with a full question list.
FAQ sections of a multi-part FAQ are indicated using the qandadiv element, which must have an ID attribute. Questions may have an ID attribute; those that don't will get one assigned that consists of the letters and digits of the question with all other characters removed.
FAQ sections are added to the database, just like chapters. However, because every question has an ID attribute (either assigned or present in the XML source) it is also possible to link directly to a question (hm, actually this isn't possible yet, but should be trivial - next version).
The script expects the root element to be called faq, even though DocBook calls it article with a class attribute of value FAQ.
Linking to other resources
Using the ulink element you can insert standard links. The attribute url holds the URL to which a link should be added. Future versions will support indirect linking (via the database), so that all links are only stored once and can be checked more easily.
The link element is available to include more robust links. Currently only for local resources, but future versions will also allow links to external resources.
The linkend attribute is set to the resourceid of the linked resource. The tool resolves the link, and substitutes the correct URL. The title of the linked resource is added as TITLE attribute (HTML 3.2).
Most resources are available in multiple languages. This means that you can link to e.g. the ID "patents" in a Dutch or an English document without having to worry about what language you link to. The tool will determine the current document's language, and link to "/patents/" or "/octrooien/" as appropriate. However, if no same-language version is available, a language-specific warning is added after the link (something like "(in Dutch)").
DocBook also uses the endterm attribute. If set to "title", the title of the linked resource is substituted for the contents of the link element. This is also fully supported.
Since iusmentis is a site about intellectual property, it should come as no surprise that there are many patent references. In the future, it will be possible to use link to link to patent databases given a patent number.
The main driver is
docbook.php, to be called
with a single argument with the full path of the XML document to be
published. Note that PHP does a
chdir() to the
directory of the PHP file, causing relative paths to break (sigh).
The driver parses the document into a DOM object, reads out the metadata from the database (and prompts you if the resource isn't in the database yet), and calls the right document type-specific driver.
Converts a DocBook article to HTML document.
Converts a DocBook book to a set of HTML documents.
Converts a DocBook FAQ to one or more HTML documents.
Functions that retrieve, write and format metadata in the database.
Wrappers for access to MySQL database (connecting to database, queries).
Printing messages to stderr (which by default isn't open in PHP - yikes) at varying degrees of verbosity.
Builds a sitemap with all the resources in the database, in all the languages in which the root element is present.
Counts number of words in one or more XML documents.
The tool expects a MySQL database (well, an SQL database that supports SELECT, INSERT and UPDATE) with the following tables:
- resources: metadata for all the resources
- authors: metadata for authors who write resources
- crossreferences: references from one resource to another
- blurbs: introduction texts for hubs
- resourceid: a 40-char unique identifier for the resource
- lang: a 2-char language identifier (primary key together w/ resourceid)
- title: title of the resource
- shorttitle: short title of the resource (usually same as title)
- authorid: link to authors table
- copyrightyear: year in which resource was created
- lastbuilt: timestamp that gets updated whenever resource is rebuilt
- lastmod: timestamp that reflects last mod of source XML document
- label: label to be used in URL to HTML version
- parentid: reference to parent of resource
- resourcetype: indicates type ('root','hub','content','sitemap','country')
The author's name, e-mail address and homepage are stored in the authors table. When retrieving resource information, the authorid attribute is cross-referenced against this table so that the page can embed author name and homepage in the footer.
Indicates links from one resourceid to another, to be used when a hub is generated.
Contains DocBook abstracts for resourceids that are of type 'hub', to be used when a hub is generated.