30 April 2007

Topic Maps for PHP5

From time to time people write me about what I wrote some time ago about Topic Maps and PHP5. Most of them ask for the ZIP file that I linked to. Unfortunately that ZIP file has been lost due to various hard-disk crashes (both my own, and my ISP [!!]) so most of the time I give some small advice and tell them the files have been lost.

Just last week I had two more such requests, and decided to do something about it; I'm doing it again, this time a bit more formal and a bit better informed. Basically, I'm starting up again a Topic Maps for PHP5 project, and this post will work as a first design draft. Some of it is based on code I've already done, and the rest I will have to take in stride over the next little while as I'm in a deadline zone right now;

Toic Maps

There's a few things to say about Topic Maps in itself. The first is a choice between using the Topic Maps DataModel (TMDM) vs. the Topic Maps Reference Model (TMRM). Actually, it's an easy choice; I've made the switch to the TMRM some time ago, and haven't looked back since. The TMRM is an abstract layer above the TMDM, and in fact you can define the TMDM in it, so it's a good candidate for doing serious work, albeit it's not as widely supported as the TMDM. In this case, it shouldn't matter too much as I'll be using it to define a version of the TMDM to stay compatible with XTM (1.0 and 2.0).

Features and requirements

The framework / toolkit will do the following ;
  • Out-of-the-box visualisation and browsing of Topic Maps
  • Provide a simple API for working with Topic Maps (but not TMAPI)
  • input and output XTM 1.0 and 2.0, input LTM and input CSXTM
It will require PHP5 with XSLT enabled. It may be PHP4 compatible, but I doubt it as I want to use the better OO structures and support from 5.x, but I'm sure that could be a fun project for someone. I also may use the Zend::Cache classes from the Zend Framework if they prove to be good, but I have some caching classes I've developed recently that does some funky stuff that I may want instead, so we'll see.

Design

A bit of controversy about this one, perhaps, but the internal representation of the Topic Maps data / reference model will be done in XML. Yup, you heard that right, but there's a few reasons that I'm going down this path;
  • Cross-technology; most technology platforms have support for XML, with API's or tools to work with them.
  • XML technologies are mature and getting very fast indeed; I don't want to impose too much OO mockery if I can avoid it for such a simple tool.
  • I love XSLT and XPath; no way around this one, and XPath is a fantastic way to handle a lot of complex XML querying, given that the XML is well-written for the task (this is why we're not using XTM but something else internally)
The framework work as follows; Take the input Topic Maps format, and convert it to our internal TMRM XML format, caching the result and process. We need converters for XTM 1.0 and 2.0, LTM (1.3, I think?) and CSXTM 1.0. As all of these formats are well travelled in the TMDM world, this shouldn't be too hard, and the internal XML format will reflect this.

Where am I up to?

The caching and converting of XTM 1.0 is complete, and the internal XML format nearly complete, with queries being about 7 times faster than they would be if working with XTM directly. That's a pretty good start. The caching layer is such that you can use file cahcing, database caching, or somewhere inbetween (for example, choosing ADO or Zend::Db), giving you heaps of options for scalability and performance.

Using the normal test Topic Map, the Opera map by Steve Pepper, somewhat complex queries and handling is done in the 0.3 ms region, which is good enough for a version 1.0 of the framework.

I'm currently tweaking the input and output buffering and converters, creating more efficent internals for the XML handling, and writing up some documentation as well.

Where do I want to go?

  • Support for "enterprise size" topic mapping, meaning maps between 5 and 50 Mb in XTM 1.0 size.
  • Concurrent editing of the maps.
  • Astoundingly easy user-interface.
  • An XSLT framework to support those user-interfaces (I've actually done this bit already as part of a different project, but I need some time to document and clean it up :)
  • An internet community to use and help out (which is also why I chose an XML internal engine wrapped up in PHP for simplicity).
That's it. I'll let you know how it goes, and contact me in regards to anything at all.

2 comments:

  1. Hey Alex,

    That sounds awesome. Good to see someone doing this. Thanks!

    --

    I wonder if the XML-representation-optimized-for-XPath bit will be easy seperate from the rest of the implementation? It may make an interesting posting/paper, although maybe you already have such a thing you are working from.

    --

    I'm sure you'll have seen
    http://phptmapi.sourceforge.net/ and/or
    http://quaaxtm.sourceforge.net/
    .. by now.

    As you say your API will specifically *not* be TMAPI, so I guess these two projects will not 'compete' with yours that sense. Maybe, with appropriate licenses it will be possible to use bits from the one to add TMAPI to the other.
    --

    Have you thought about the license that you'll use?

    ReplyDelete
  2. Hi Miles,

    Good to hear from you again. How's NZ doing?

    > I wonder if the XML-representation-optimized-for-XPath bit will be easy seperate from the rest of the implementation?

    Well, I'll try my best at it. :) Mostly it will be a series of XPath expressions, as a lot of the internal XML will use xml:id and xml:idref(s) to use for semantic indexing. Some optimalisations will be partial/full and possibly serialised XPath queries, but nothing that would make it hard to convert from PHP to any other technology. As long as the XML format and the XPath queries are documented well, I think it should be easy for anyone to replace PHP with whatever else they feel like, if they wanted to.

    > It may make an interesting posting/paper, although maybe you already have such a thing you are working from.

    No, I've got nothing of the kind. :) Over the last 6 years I've been doing Topic Maps in various incarnations, but mostly through some XML representation at the hinges. In this project I'm just pulling together all the stuff I've learned over the years to see if we can simplify some aspects of the TM stack. Maybe I should write a paper as I go along, though. It's an interesting problem.

    > Maybe, with appropriate licenses it will be possible to use bits from the one to add TMAPI to the other.

    I'm not saying "no TMAPI" because I don't want it, but more that I'm not going to force my design ideas into the conceptual model that the TMAPI is. Support for that (and other) API's I'll try to steer to the side of the implementation, and not an integral part of its design. This gives me more freedom to experiment with alternatives that perhaps those API's influences with.

    > Have you thought about the license that you'll use?

    Yeah, probably GPL 2 or 3, so very liberal.

    ReplyDelete