21 October 2000

Criticism, MARCXML, the culture of MARC, and the long difficult struggle to stay alive

[note : just realized I didn't post this one, which was written quite some time ago. A nice filler to, er, fill with.]

Today I want to talk a little bit about criticism in general and within and of the library world, and just make a few points about the culture that permeates their standards and the view on technology in it. There's a few things that needs to be said, both in the context of pushing the library world forward but also on how to understand me and why I'm doing this. I'm not a librarian in technical terms (I guess I just lost a few readers. Again. But please read on; be strong!), never was, never will be, so why am I even bothering?

Before I get started on that, though, let's refresh ourselves with my criticism of MARCXML of a few days ago;
[Whoever deals with MARCXML] waste most of their time trying to figure out why the hell someone came up with this evil way of making your brain melt. Well, obviously, if your brain melts, it's evil, but there is something so anti-XML about the way MARCXML was designed I'm starting to wonder.
Do you know what's glaringly missing from the above quote?

Oh, crap!

Expletives! Now, I know a lot of people - especially proper folks - will "ignore my message" if I throw expletives into my prose. And certainly, if all I did was swear and use foul language right, left and center I'd ignore me, too. Don't get me wrong; I hardly ever use bad language, I don't approve of it under normal circumstances, and I certainly wouldn't subject myself or others to filthy disposition or discourse -- unless, of course, it brought something to the topic at hand, or - more to the point - the discussion itself.

There have been times when soothing and lulling words have been repeated so often they mean nothing anymore, when the truth told in goody words no longer shake people and wake them up from their fluffy dream-world where everything is fluffy, white and wonderful. A well-meaning "shit" can have an awakening effect, and when you mix in my own passion for the subject, it's hard to filter out what might be considered uncouth. So, I said "shit" on the NGC4LIB mailing-list a couple of times, and perhaps I called a few "bullshits", said that stuff was "crap", nothing big, really.

Now tell me, is this really words that would throw off my whole spiel, remove my opinion from the collective communicado flow, and pee in the well of truth? Are people - especially library people - so distanced from reality that when the language of the commons meet them (yes, I'm playing the elitist card here), they put their hands over their ears, and shout "don't want to hear it!" while shaking their heads from side to side?

The reason I ask is not because I can't control my filthy mouth. No, I ask because it's the same kinda thing I wonder the truth does to them, that when I say "you must get out of the MARC conundrum NOW!" it comes across to them as "you gotta get your shit together man! Screw this MARC crap!" -- in other words, the truth of what me and lots of others are saying comes across as the equivalent of foul language, and thus ignored? No matter if it is filthy language or not, it comes across that way?

I don't actually believe this is so, of course. I know there's lots of librarians who understand most of these issues, but really, they can't do much about it, and feel like there's no point in raising their voices and rocking the boat. And so the boat silently sinks.


Criticism is mostly about rocking the boat. Sure, there's positive criticism, like "you're not ugly, just beautiful-impaired!", but aren't we over this silly overly political correctness by now? Criticism is to tell it straight, that what someone else has done is not up to scratch, that surely there must be some improvement that could be done. But the library world don't work like that. Criticism in the library world uses a different word; approval. I know, I know, sounds ludicrous, and if you're a librarian yourself you are right to call "bullshit" (although, of course you don't), but most of the crap that comes out of the library world comes out as a de facto standard because not enough librarians have stood up and called it crap. When no one calls it crap, the proposals become real. Sometimes that's ok, but other times it's outright scary. And then there's times when what comes out is threatening to ruin a whole sector with its poison, all in the name of not standing up, afraid of rocking the boat.

One such poison is of course MARCXML as mentioned earlier, where the very notion of XML in it is pure unrefined anti-XML evil. Another is EAD, an XML standard for Encoded Archival Description (yeah, you try to figure out what it is with a name like that) where the general idea is a good one (digital description of your archives with a focus on non-bibliographic materials), but the actual output is terrible, littered with poor XML use, poor data modeling, and, as usual, seeped in archaic library terms and conditions. How can I dig into the details and criticize it in a positive way, as they all claim you should?

Positive criticism is hailed over anything else. If you don't have anything good to say, then don't say it. But after you've said the good bits, why are we so shy to deliver the bad bits? Are we then to wrap all bad criticism up in good wrapping so as, um not to hurt people? Is that what it comes down to, sugaring our salt?


Ok, normally I wouldn't question these things. I'm not stupid, and understand that positive criticism is perhaps the best way forward, especially if you are ever to have lunch with the people you criticize at some later point. But there's a catch; it makes progress painfully slow, if progress happens at all.

You see, if I pamper you with good words about something that stinks, and say it smells like flowers and perhaps need a tiny extra fragrance to make it better, you're not removing that stink; you're adding to it. There are of course a million ways of doing this, and yes, there are many ways in which you can pad your blows and make it easier to a) deliver the goods, and b) be listened to. I understand all this. But I don't have the time, and neither do you.

Sorry to say, but we don't have time to dick about with niceties anymore. I started my MARC cultural experiences over 5 years ago, and nothing good has happened in the field of "saving the library world from MARC" in those years. It was already in peril back then, with a plethora of MARC standards (who would have thought that one standard in reality was more like 20 standards, all almost the same but with local tweaks?) Sure, the odd experiment and the rare project or prototype has popped up from time to time, but in reality, there's nothing out there that has the potential to pull off a rescue mission of all those inflicted with MARC. You can throw your FRBR and RDA some other direction, because they do not have vendor nor library infrastructure support on a larger scale. Heck, you all talk about it, but no one is actually really doing it. Apart from that odd prototype, of course, shiny and fresh as it sits in a corner waiting to be obsolete.

So let's talk about timescales, because my statement of being in haste is far truer now than it was 5 years ago. I started gently back then, asking questions, prodding the shortcomings of a format to find out that there's really nothing wrong with it per se. No, it was the virus named MARC that was causing the sickness I witnessed, a culturally dependent nauseating disease of rules, half-rules, standards, chaos, vendor bingo, conveniences, myth and magic. The format was and is only a carrier of the disease itself.

MARC came about in the 60s and 70s and was great! Truly awsome! It kept going through the 80s when one should have started to look for alternatives, but because it was still a Good Thing (TM), it just kept going and growing. The 90s and the dawn of the Internet hit us. Still going. 95-96 saw a tremendous explotion in Internet activity, and it was around these times that most larger libraries thought more seriously about their online presence.

Here's the thing; at that time, no one thought that this "new" fandangled network that was going to dominate all future computing, commerce and communication was something to take advantage of in terms of library infrastructure? Or let me put it this way; how many Z39.50 implementations have happened between 1995 and 2008? The answer should be your new battlecry.

Now, let's quickly check yesterdays status (ie. roughly before 1995). I read with my retrospecticles that when the world (ie. Internet folks of various kinds) wanted bibliographic meta data, or, heck, advice on any kind of meta data, they turned to the one institution that had yonks and buckets of experience in the field; the library world. Dublin Core was one of those fantastic things that came out of that "world leader in meta data" role you had back then, a simple start to meta data description that now has fallen into obscurity and disuse because it was never extended (at least not in the way that most people on the net needed it) and later drowned by librarian lingo and committee orgies.

The present situation is rather different. The world has stopped asking librarians for advice on these things, and I can think of a number of reasons why;
  • The library world didn't keep up: The technology and the Internet developed far too fast for the librarians to keep pace, and they went from early adopters and sometimes even innovators, to hanging on the the long tail for hard life.
  • The world also knows about meta data: Yes, as crazy as it sounds, not only librarians know about how to deal with meta data, and as IT professionals all over the world understand and can develop for the problem better, there are bound to be knowledge and initiatives that are not based in library traditions.
  • Snobbish library attitude to anything non-academic: This one doesn't apply to all librarians of course, but there is a "we" and "them" mentality from the ranks of academic librarians. Are you an IT guy who stumbled into their world? Mate, you're a second-class citizen. Unless you've got your library school masters of some kind, you will never pull punches in their world, and there will always be an invisible force-field between librarians and everyone else. And librarians: You can protest all you like, but this is a problem I've talked about lots before, this is not just my opinion, and something you really need to take seriously.
  • Library world business models that are uncompatible with the future: Most libraries have one business model, which is collect stuff, make it available for "everyone", and get a yearly budget from local or national government to do so. There's a couple of exceptions which I must talk about in the next section about business models.
But before we get to the business juice we need to wrap up why I think we're in haste, and the keyword(s) here is "internet time." Library world time is totally and utterly incompatible with internet time. The library world is a safe steady generic march in the direction of the internet long-tail, while internet time wooshes past in a scary pace, slow and fast at the same time as a teenager freaking out in their first driving lesson, without direction, all the while the police sits by wondering if to write out a ticket or just roll around laughing. or both.

The library world is based on those steady calm majesticly paced movements, and while that has worked for hundreds of years, all of a sudden something remarkable happens; information now turns digital. And the world just exploded with possibilities. The library world were in this field actually an early adopter of computers and software but only to the point of applying them to solve a mostly analog problem; paper books. So while the rest of the world went forward, libraries stuck to their books. And these days, who you gonna ask for advice on how to manage meta data for your eBooks?

We're in haste because the library world is missing out on more and more opportunities of being a real player in the meta data playing field. Now both Amazon and Google have APIs that give you want the library world have refused to give the world for a long time, with the added advantage for Google and Amazon that they evolve, they're adding more and better data to their APIs, and they play it open.

But maybe I'm wrong; maybe we aren't in haste because the library world already have lost. There is no more ground to cover, unless you want to bicker about the proper LCSH to use for what book, but last time I check this wasn't a solved problem within the library world either.

Business models

I'll mainly focus on OCLC which I'll quote a bit from the about section;
"Founded in 1967, OCLC Online Computer Library Center is a nonprofit, membership, computer library service and research organization dedicated to the public purposes of furthering access to the world's information and reducing the rate of rise of library costs. More than 69,000 libraries in 112 countries and territories around the world use OCLC services to locate, acquire, catalog, lend and preserve library materials." (my emphasis)
OCLC is one of the most important players in the library world today. Forget puny Library of Congress, or any of the many national libraries. If OCLC says we're gonna burn censored books at the stake, most libraries will follow (and for non-librarians, the translation of that statement is roughly that if Republicans say that you should vote for them because Palin is a female, you all go and do so). They're powerful, and people around the world listen to them very carefully. But there's a snag.

Libraries around the world pay good money to be members of OCLC. Basically, the business model of the most influencial (and dare I say most powerful) library organisation in the world is one totally disjointed from the normal governmentally funded library model. The implications here are huge; where taxpayers probably would be better off with having all the meta data freed for all to use (that's what you pay for, right? And that would enable innovation, right?), OCLC must keep them locked down in order to stay alive.

Ok, maybe I'm being overly pessimistic, as indeed OCLC are slowly opening up their meta data repositories, but it's going soooo sloooow! It was argued before I entered the library world, it was argued during it, and has been argued ever since. All the while we're waiting, more opportunities are being missed. And I'm not saying this lightly. Why do you think Google made their book API available? Fed up waiting for the library world to do it, that's why. And when they do it, consider it a lost battle no matter the quality of the meta data; Google has more data and seriously wicked hackers than the whole library world combined could ever dream of, and they play it open, and they play it with as much dedication and passion as librarians themselves, and they will kick your arse at your own game.

What other opportunities are we missing out on while we sit here and talk about the fact that we've got problems?

 [note : Funny to see how things went with the OCLC debacle after this post was originally written.]