29 September 2009

Library Pontifications

Once in a while I get some email from people who ask me some questions or ask me to clarify something I've said in some setting. The other day I ranted on the NGC4LIB (Next-generation catalog 4 libraries) mailing-list about, uh, something or other. And I got email, which I answered, but since I got no reply I'm posting it here in a blog-edited form so that it doesn't go to waste ;
I think I am starting to understand your rants against the culture of MARC, and I'd probably feel offended if I knew what all of the above meant.
Hmm. Well, it wasn't meant to offend anyone. I guess if people thought they were hardcore into persistent identity management, then maybe they would feel I've either overlooked their hard work or don't think what they're doing is the right kind, or something.

I usually have two goals with my "rants"; 1. flush out those who already are on the right track, and make them more vocal and visible, and 2. if no one is on the right track, inspire people in the library world to at least have a look at it. I can do this because I have no vested interest in the library world as such; I cannot lose my library job as I'm not working for a library. :)
Naturally, to feel outside of the mainstream creates a crisis of confidence in one's abilities. What does it mean these days to say that one is a cataloger or that one works in tech services, and is it perceived as a joke for those on the outside? Oh yeah...they still produce cards. What do they know about databases?
Librarians are from the outside an incredible gifted bunch of people who knows what they're doing, they have granted powers outside the realm of normal people (including professionals like software developers, believe it or not), and they know stuff we normal folks don't.

However, having been on the inside you get to glimpse the reality of an underfunded, underprioritized sub-culture of society who knows as little about the "real-world" as normal folks know of the library world. There is a great divide between them, and very little has been done to open up. The blame for this I put squarely on the library world (as the real-world is, well, real and out there) who for many years have demanded a library degree even for software development positions, and when we finally get there we are treated as second-class citizens because we don't have that mark of librarianship that comes from library school. It's a bizarre thing, really, and perhaps the most damaging one you've got, this notion of librarians must have a library degree, as if normal people will never understand the beauty of why a 245 c is needed, or the secret of why shelves must be called stacks, and so on.

One thing that has got me very disillusioned about the library way is philosophy. I deliberately sought out the library as a place to work because I have a few passions mixed with my skills which I thought was a good match, and one of the strongest passions were epistemology. One would think that if there was one institutional string of places that could appreciate the finer details of epistemology, it would be the libraries and the people within. That's what they concern themselves with, no?

Err, no. No, they don't. There's the odd person that ponders how a OCLC number can verify some book's identity, but these are very plain boring questions of database management. Then along came FRBR which does not only dip its toes into epistemology, but outright talks about it! The authors of it clearly had knowledge and wisdom about such things. So, one would think there was hope. Like, when it came out in 1993. That's more than 15 years ago. And people still haven't got it. How much time do you reckon it's going to take, and more importantly, how many years until it's way too late?

But no, RDA comes out of the woodwork and proves once and for all that there is no hope of libraries ever taking the issues at a philosophical nor practical level. Let me explain this one, as it sits at the core of much of my "ranting."

FRBR defines work, expression, manifestation, item, and these are semi-philosophical definitions that we're supposed to attach semantics and knowledge to. There's primarily two ways to do that; define entities of knowledge, or create relationships between entities. (Note these two basic ways of doing knowledge management; entities and relationships, as they spring up in all areas of knowledge representation)

Now, can you without looking stuff up tell me the difference between a work and an expression? Or between manifestation and an item? Sure, we can discuss if this or that thing is an item or something else, back and forth, but is that a good foundation upon to lay all future library philosophy? Because that's just what it is; a philosophical model we use to make sense of the real world. FRBR is confusing, even if it is a great leap forward in epistemological thinking, for example when it comes down to identity management (persistent identifiers for one thing can be expressed through a multitude, like a proxy, which FRBR fails at miserably, for example) it is right there in the centre of it, but a lot of it focuses on the wrong part of it, the part that involves human cognition to make decisions about identity.

Anyway, I guess at this point all I'm trying to say is that there are glimpses of what I'm talking about in the library world, and I was attracted to it, I wanted to dedicate parts of my life to fixing a lot what was broken in the real-world. I came to the library because they are the shining beacon of light in our society.

So, what happened?
Which is why I am interested smarting up about some of these things. Where should one go for a decent but not mind-blowing introduction to the types of things you have described lately?
It's hard to say what will blow your mind, and what will not. But since you're a library type person I'm going to go out on a limb here,and assume you're a smart person. :) So, I'm going to assume that http://en.wikipedia.org/wiki/Epistemology won't blow your mind. So let's assume we're using the definition for "subject" as such ;
  • An area of knowledge, a topic, an area of interest or study
In terms of philosophy we usually expand that definition a bit wider (so it will also include most discourse and literature) but I'll try to keep it simple. First, a question?

"What does it mean that something
is something?"

This is the basic question for identity, that something exists and that we can talk and refer to it. Refering to things is a huge portion of what the library does, not only as an archive, but as a living institution where knowledge is harboured. We're talking about subjects put into systems, about being subject-centric in the way we deal with things. Just like our brains do.

Now, for me there's a few things that have happened the last 20-30 years. The world has become more and more knowledge centric (they've gone from "all knowledge are in books" to "knowledge can be found in many places", and the advent of computers and the internet plays no small part in that), while libraries have become more book specific, more focused on the collection part rather than what the collection actually harbours in terms of knowledge (and I suspect this is because there are no traditional tracks within the library world for technology), probably because it's easier and fits better into budget driven government run institutions.

However, this isn't beneficial to the knowledge management part. Libraries are moving steady towards being archives, but the world wants them to become knowledge specialists. Ouch. And so the libraries will be closed down when they
don't deliver knowledge. Archives is what Google does best, and they're not that bad at harbouring basic knowledge. What hope in hell have you got then?

I'm running out of time right now, but feel free to ask any question and point to any of my wrongs, and laugh at it as well; I need the discourse as much as (I hope) you do. Let me just quickly run through that list with comments and pointers ; [
editors note : this is a list of things I felt the library world 'have no clue about' from my mail to the mailing-list]
  • No idea about digital persistent identification.
What happens to identifiers when people stop maintaining them? They lose their semantic and intrinsic value, and become moot. How many libraries maintain their age old software? No, a more human, less technological means of resolving is needed, and when when the world went digital the choice of multiple identities became not only possible but inevitable. Yet, when the library world manages identities as OCLC / LOC record numbers at the item level, things go horribly wrong and you cannot take what you've defined and learned into the philosophical space. Even if the OCLC / LOC numbers are maintained till the end of the world, they do not solve basic epistemological problems.
  • No subject-centricity.
FRBR does actually provide some, but it is not focused on the epistemological problems, only one of identifying the problem of identification without providing a mechanism (real or philosophical) for doing so.
  • No understanding of semantics in data modeling.
The AARC2 / RDA world is, in some definition of the terms, a data model. And between entities in data models there are semantics, meaning the relationships themselves, their names, roles and thought purpose. But you have to understand, as a human, all of AARC2 / RDA to be able to model anything with it; there's no platform on which to stand, there's no atomic parts you can use to build molecules and then cells and then beings. The whole model is, in fact, a hobbled-together set of fields without structure (and no, numbering them is not a structure :), and without structure there's only rules. And rules without structure is only human-enforceable.
  • No clue about ontologies, inferencing, guides by analogy
This is a stab at what the Semantic Web people are doing. They have a long background from AI and knowledge management, and if you guys were at least on par with that group, there could be some better understanding of the issues. The SemWeb crowd understand a lot of first-order logic, inferencing, analogy, case-based reasoning, and so forth, all stuff you need to have computers understand a tad bit better how your data is hobbled together, how they all interact, how entities and relationships (remember those? :) are mapped.

I should of course make a note here that I think that the SemWeb efforts are mostly wrong, and that they could learn an awful lot from librarians in the way to deal with collections and access, but that's a different discourse for some other time. :)
  • no real knowledge about collection management ( ... wait for it ...) with multiple hooks and identities
I was actually hoping people would jump on this one, getting offended that I said they had no real knowledge of collection management (which is their forte, it is what they do!), but I guess either they saw the hook and line of *identities*, and jumped over it. Dang.

It's all about the identity of what you are collecting. Crikey, publishers haven't even got ISBN to work (how many times to I put in one ISBN to get a completely different book ...), and one would think that would provide hints to why this is hard, and perhaps what to do otherwise. Hmm.

-- end of mail except some more personal ramblings not fit for generic consumption --