30 April 2007

Topic Maps for PHP5

From time to time people write me about what I wrote some time ago about Topic Maps and PHP5. Most of them ask for the ZIP file that I linked to. Unfortunately that ZIP file has been lost due to various hard-disk crashes (both my own, and my ISP [!!]) so most of the time I give some small advice and tell them the files have been lost.

Just last week I had two more such requests, and decided to do something about it; I'm doing it again, this time a bit more formal and a bit better informed. Basically, I'm starting up again a Topic Maps for PHP5 project, and this post will work as a first design draft. Some of it is based on code I've already done, and the rest I will have to take in stride over the next little while as I'm in a deadline zone right now;

Toic Maps

There's a few things to say about Topic Maps in itself. The first is a choice between using the Topic Maps DataModel (TMDM) vs. the Topic Maps Reference Model (TMRM). Actually, it's an easy choice; I've made the switch to the TMRM some time ago, and haven't looked back since. The TMRM is an abstract layer above the TMDM, and in fact you can define the TMDM in it, so it's a good candidate for doing serious work, albeit it's not as widely supported as the TMDM. In this case, it shouldn't matter too much as I'll be using it to define a version of the TMDM to stay compatible with XTM (1.0 and 2.0).

Features and requirements

The framework / toolkit will do the following ;
  • Out-of-the-box visualisation and browsing of Topic Maps
  • Provide a simple API for working with Topic Maps (but not TMAPI)
  • input and output XTM 1.0 and 2.0, input LTM and input CSXTM
It will require PHP5 with XSLT enabled. It may be PHP4 compatible, but I doubt it as I want to use the better OO structures and support from 5.x, but I'm sure that could be a fun project for someone. I also may use the Zend::Cache classes from the Zend Framework if they prove to be good, but I have some caching classes I've developed recently that does some funky stuff that I may want instead, so we'll see.

Design

A bit of controversy about this one, perhaps, but the internal representation of the Topic Maps data / reference model will be done in XML. Yup, you heard that right, but there's a few reasons that I'm going down this path;
  • Cross-technology; most technology platforms have support for XML, with API's or tools to work with them.
  • XML technologies are mature and getting very fast indeed; I don't want to impose too much OO mockery if I can avoid it for such a simple tool.
  • I love XSLT and XPath; no way around this one, and XPath is a fantastic way to handle a lot of complex XML querying, given that the XML is well-written for the task (this is why we're not using XTM but something else internally)
The framework work as follows; Take the input Topic Maps format, and convert it to our internal TMRM XML format, caching the result and process. We need converters for XTM 1.0 and 2.0, LTM (1.3, I think?) and CSXTM 1.0. As all of these formats are well travelled in the TMDM world, this shouldn't be too hard, and the internal XML format will reflect this.

Where am I up to?

The caching and converting of XTM 1.0 is complete, and the internal XML format nearly complete, with queries being about 7 times faster than they would be if working with XTM directly. That's a pretty good start. The caching layer is such that you can use file cahcing, database caching, or somewhere inbetween (for example, choosing ADO or Zend::Db), giving you heaps of options for scalability and performance.

Using the normal test Topic Map, the Opera map by Steve Pepper, somewhat complex queries and handling is done in the 0.3 ms region, which is good enough for a version 1.0 of the framework.

I'm currently tweaking the input and output buffering and converters, creating more efficent internals for the XML handling, and writing up some documentation as well.

Where do I want to go?

  • Support for "enterprise size" topic mapping, meaning maps between 5 and 50 Mb in XTM 1.0 size.
  • Concurrent editing of the maps.
  • Astoundingly easy user-interface.
  • An XSLT framework to support those user-interfaces (I've actually done this bit already as part of a different project, but I need some time to document and clean it up :)
  • An internet community to use and help out (which is also why I chose an XML internal engine wrapped up in PHP for simplicity).
That's it. I'll let you know how it goes, and contact me in regards to anything at all.