26 May 2006

The epistemological implications of Topic Maps for librarians

Topic Maps in the library world


Quite often I'm asked about the link between libraries and Topic Maps, given that the latter is something that I've tried to specialise in. For example, I was recently invited to join a panel at LITA's Nashville conference 2006 as a Topic Maps "expert" (meaning; someone who knows a little more than the rest). Sadly I couldn't attend, which is a shame as I had an exciting Topic Maps paper accepted, although since it touches on the topic of this post you'll get some gist of it from here.

I wrote an introductory article about Topic Maps some time ago, and quite a number of librarians (or in the business of librarians) have since asked numerous questions about it, how well it fits into the library world, and isn't it fun doing all that Topic Maps work?

Lots of people in the library world have got the "metadata map" part of it somewhat right, but few seem to understand what Topic Maps really is all about. Yes, it's mostly about metadata, but no, it doesn't support a single metadata standard as such; it's a general data model in which you can fit whatever metadata you wish. Some folks gets confused at the "map" part of Topic Maps, and understandably so; "map" gives us certain association with something visual, however that is quite misleading; the "map" refers to modelling.

First of all, "data modelling" is most often hijacked by relational database folks as a term to explain how they design their databases, document it, do their normalisation and optimisation of the model, and so forth. The reason I stuck "epistemological" in the title of this post is to separate myself a bit from the RDBMS (Relational DataBase Management System) guys for a minute, and talk about philosophy ;

Epistemology


There are a number of epistemological (and notice that Wikipedia URL; 'Justified_true_belief'!) things that apply to data modelling, such as "What is a piece of knowledge?", "What is information?" and "What is representation?" These are good questions; How can we think we do knowledge management if we don't know what it is? How can we create information systems without know what information is? How can we represent our knowledge and information if we don't know what that representation mean?

I'll let you know that I'm in the representationalism camp in these regards; anything outside the workings of my own mind is observed by proxy, even other people's knowledge. I need to find ways to fold new perceptions into my own knowledge to gain new knowledge. My tea in front of me is represented by the visual cup, the smell of aroma, and the taste of tea, bit of sugar, pinch of milk; observations that make up a context.

This context can be represented by "something", and this is in all simplicity all that we information folks do; we try to come up with models that best represent the information for us and for the computer, for reuse, for knowledge creation, and for archiving.

In a Topic Map, this context is conceptualised through a Topic, contextualized through Associations, and turned into information through Occurrences, but there are hundres of other ways to do it, in relational databases, in XML, in binary formats, with paper and pen, with facial expressions, through music and dance and art and ...

Ontology


It's all about expressions of something. With computer systems we have a tendency to think in very technological ways about these things, but as any long-time database modeller knows; there are people who are good at normalising, and people who suck at it! My theory here is that the people who are good at normalising understand epistemology (knowingly or not). The same with people who are good at creating XML schemas, or good at design, good with visual design, good at writing, good at presenting. In fact, I'd stress that epistemology understanding is crucial to any form of quality representation of an expression.

Let's take a step sideways into ontologies for a second; ontology asks "What actually exists?" and goes on to define a model in which we can represent that which we think actually exists.

In modern information sciences, ontology work is what we refer to when we try to explain "things" through a more formal network of definition, so that "X is a Y of type Z" and "X is the opposite of D" and "D is a class of U"; given enough such statements, computers (or humans if you're patient) can infer "knowledge" (basically; hidden or not explicitly stated information). Of course, you need to have a lot of these statements, and they must all be true, and probably authoritative, which for many is the very reason they don't believe in the Semantic Web (of which I'm such a sceptic myself).

In a closed system though, I have much belief in ontological models and information systems, and libraries have a lot of closed systems in which openness to the hidden information could provide some seriously good applications for it. For example, a lot of what librarians care about are in collections of sort, and a collection and the metadata about it can well be mined for some rich information not explicitly stated.

Collections in a Topic Map


I've done a few experiments with collections in Topic Maps with some pretty good results. For example, there's the "Fish Trout, you're out" childrens folklore in our oral history project; I got all the MARC records that belongs to the collection, converted it to a Topic Map, and lots of interesting things happened; I learned more about the collection, knew more about what type of information was within it, I could browse through it through various facets, I could ask the Topic Map for items that had complex relationships ... basically, I could do a bucketload of things that no OPAC could ever dream of being able to do, yet we both had the same basic MARC records to work with.

The recent National Treasures exhibition was designed with my XPF framework, an XSLT-based wrapper and query language for Topic Maps, so all the data items in that collection sits in A topic Map; every picture, every comment, every text, every page, every theme and every note. Yet, the actual site looks pretty much like most other sites out there, so where's the juice? Well, internally we've created a couple of alternative interfaces to the Topic Map with dramatic different results, and although they are not public (and probably never will be, although we thought about creating an interface for kids!) they showed us again what rich hidden information we could get out of data we already had. And that's an important key to why Topic Maps are so important!

Another collection I've plodded with in a Topic Map is the Mauritius Collection, over 2000 items with a great variety of semantics and types. One of the problems with a lot of these collections is maintaining them, and getting an overview of the collection is often quite difficult; people spend years trying to get the full picture, especially if the collection is somewhat fluid (items coming and going from it). The Mauritius Collection is hard to get an overview of, yet in a Topic Map - a model which is designed from the ground up to handle complex relationships - it seemed almost too simple to browse around the collection, looking for things or simply exploring stuff that's there and learning stuff in the process.

And I've yet to talk about books in this context, but most other people are fixated on books and cover them quite well. To me, life and everything I do isn't based on books, but all of the collection wonderment mentioned for items can equally be applied to books. Personally, if I was given the oppertunity, I'd give our maps collection a go next!

Epistemological implications of Topic Maps for librarians


So what are these implications? Well, there's a few paradigms that differ from the normal set of information technology set of RDBMS, databases, OPAC, fielded search and NBD (National Bibliographic Database; another library term for a large database).

First of all, librarians know about thesaurus and taxonomy work. In the former there are notions such as "broader term", "narrower term", "related term", "use instead", and so forth; these all makes up the ontology of the thesaurus; they explain what things might be, and in a thesaurus, in a very loose and general way (mostly). In a taxonomy, most of the relationships between items (and hence the ontology) is explained through the structure itself; this item is above this one, meaning "X is a Y" or, in more complex taxonomies, "X is an instance of class Y which is super-class of Z".

Topic Maps takes this a few steps further; in the same Topic Map, you can have a thesaurus, a taxonomy, a facetted classification system, LCSH (Library of Congress Subject Headings), MARC records and an ontology, all working in unison. This has some implications to how we can use the information in single applications, but also on what synergetic implications as well - in revealing hidden information that's not explicitly stated.

Secondly, Topic Maps is based around the notion of atomic nodes on which you hang various information, such as metadata and relationships, and this is quite unlike a record in a database, of which MARC is a good example. But what's important to understand is that we're not talking about taking the data out of MARC or converting MARC to MODS to XOBIS to Dublin Core to whatever; no, MARC stays as MARC, but Topic Maps lays a layer of "semantics" (we can stretch or implode the meaning of "semantics" here, I think; it all depends on what you want to do, how much energy you're prepared to waste and resources you've got allocated) on top. This is why it's a Map; a map to guide you through your information soup.

And thirdly, soup with added Topic Maps makes a dang fine stew. I love stew.

We knew that


A lot of librarians (and others who might read this) already knew all this; why am I telling you this, then?

Because in order to truly understand Topic Maps and why I'm so keen on them, is to understand how Topic Maps and its data model is closer to human cognition and epistemological ideals than what we're currently immensed in, such as the relational database, the notion of a "record", the notion of collections that don't overlap (Hah! I dare you to show me one!), the ideas of a book being atomic (the guys who's into FRBR knows all about this one), the idea of marshalled viewpoints of information (guides vs. the reference librarian), taxomatic classification schemes (this one is heavily disputed, but in the classical form it certainly causes problems, although it might be more the human problem than a technological one; for example, can we mix and match LCSH, tagsonomies and thesaurii? You can in a Topic Map with relative ease.) and so forth.

In the end, how do we know that what we're doing aids our goals? Is our technology working for us, alongside us, behind us, against us? The goal must be to preserve and encourage knowledge, right? For libraries this is of course on the borderline between the collection mentality and the education mentality; some librarians have only one of these, some both (and a few rare exceptions neither!) and then various mixes of the two. In my view, there is no two; they are the same thing.

How do we know that we're delivering systems that's supposed to help them in whatever quest they have? Right now I feel we're second-guessing on every level on that question; we design systems with a specific set of features in the hopes that we help at least a given percentage of users. I'd stress that we really need to work it the other way around, as usability has shown us time and time again that guessing what the user wants will always fail; we need completely open systems where the user narrows the features until the goal is reached! This is what we humans are about, isn't it? First we read the table of contents or the index (both from which you gain a sense of overview), then jump to the right chapter for the details, and from there make descissions on where to go next.

Let's not design more applications; let's design systems.

1 comment:

  1. ==One==

    "Data, data everywhere,
    and not a thought to think."

    As I a child I was nearly obsessed by "significance". Many many years later I read in cognitive psychology and was presented with "schema theory".

    My point is this: that something exists may be grist for the ontologists' mill, and maybe for the taxonomists' too ... so what?! When that question is treated as other than rhetorically we suddenly find real traction.

    What's at the heart of discourse? Putative existence? No. Human salience. It matters cuz it matters to someone. If a tree falls in the forest and nobody's there to hear it *shrug*, next?

    Nero cared about his fiddle, he fiddle he did as Rome burned.

    Bales of cash are changing hands in Baghdad. If it benefits me to deny that then deny it I will.

    My notion of "participatory deliberation" is discourse based because all other schemas for mapping are technological and several whereas human salience is fuzzy and subject to first-person reportage.

    "Who cares and why?" ... salience and significance ... getting our ducks in a row for good reasons; to me all other matters seem glib.

    ReplyDelete