15 October 2009

Ontological Ponderings

The last few months have been interesting for me in a philosophical sense. My job is on an architectural level in using ontologies in software development, both in the process (development, deployment, documentation), the infra-structure (SOA, servers, clusters) and the end result of it (business applications). So needless to say, I've been going a bit epistemental, so I promised myself yesterday to jot down my thoughts and worries, if for no other reason than for future reference.

One big thing that seems to go through my ponderings like a theme, is the linguistic flow of the definition language itself, in how the mode of definition changes the relative inference of the results of using that ontology over static data (not to mention how it gets even trickier with dynamic data). We usually say that the two main ontological expressions (is_a, has_a) of most triplets (I use the example of triplets / RDF as they are the most common ones, although I use Topic Maps association statements myself) defines a flat world from which we further classify the round world. But how do we do this? We make up statements like this ;

Alex is_a Person
Alex has_a Son

Anyone who works in this field understand what's going on, and that things like "Alex" and "Person" and "Son" are entities, and defined with URIs, so actually they become ;

http://shelter.nu/me.html is_a http://psi.ontopedia.net/Person
http://shelter.nu/me.html has_a http://en.wikipedia.org/wiki/Son

Well, in RDF they do. In Topic Maps we have these as subject identifiers, but pretty much the same deal (except some subtleties I won't go into here). But our work is not done. Even those ontological expressions have their URIs as well, giving us ;

http://shelter.nu/me.html http://shelter.nu/psi/is_a http://psi.ontopedia.net/Person
http://shelter.nu/me.html http://shelter.nu/psi/has_a http://en.wikipedia.org/wiki/Son

Right, so now we got triplets of URIs we can do inferencing over. But there's a few snags. Firstly, a tuple like this is nothing but a set of properties for a non-virtual property and does not function like a proxy (like for instance the Topic Maps Reference Model do), and in transforming between these two forms gives us a lot of ambiguity that quickly becomes a bit of a problem if you're not careful (it can completely render inferencing useless, which is kinda sucky). Now given that most ontological expressions are defined by people, things can get hairy even quicker. People are funny that way.

So I've been thinking about the implications of more ambiguous statement definitions, so instead of saying is_a, what about was_a, will_be_a, can_be_a, is_a_kindof_a? What are the ontological implications of playing around with the language itself like this? It's just another property, and as such will create a different inferred result, but that's the easy answer. The hard answer lies between a formal definition language and the language in which I'm writing this blog post.

We tend to define that "this is_a that", this being the focal point from which our definition flows. So, instead of listing all Persons of the world, we list this one thing who is a Person, and moves on to the next. And for practical reasons, that's the way it must be, especially considering the scope of the Semantic Web itself. But what if this creates bias we do not want?

Alex is_a Person, for sure, but at some point I shall die, and then I change from is_a to a was_a. What implications will this, if any, have on things? Should is_a and was_a be synonyms, antonyms, allegoric of, or projection through? Do we need special ontologies that deal with discrepancies over time, a clean-up mechanism that alters data and sub-sequentially changes queries and results? Because it's one thing to define and use data as is, another completely to deal with an ever changing world, and I see most - if not all - ontology work break when faced with a changing world.

I think I've decided to go with a kind_of ontology (and ontology where there is no defined truth, only an inferred kind-system), for no other reason that it makes cognitive sense to me and hopefully to other people who will be using the ontologies. This resonates with me especially these days as I'm sick on the distinction people make between language and society, that the two are different. They are not. Our languages are just like music; with the ebb and flow, drama and silence that makes words mean different things. By adding the ambiguity of "kind of" instead of truth statements I'm hoping to add a bit of semiotics to the mix.

But I know it won't fix any real problems, because the problem is that we are human, and as humans we're very good at reading between the lines, at being vague, clever with words, and don't need our information to be true in order to live with it. Computers suck at all these things.

This is where I'm having a semi-crisis of belief, where I'm not sure that epistemological thinking will ever get past the stage of basic tinkering with identity in which we create a false world of digital identities to make up for any real identity of things. I'm not sure how we can properly create proxies of identity in a meaningful way, nor in a practical way. If you're with me so far, the problem is that we need to give special attention to every context, something machines simply aren't capable of doing. Even the most kick-ass inferencing machines breaks down under epistemological pressure, and it's starting to bug me. Well, bug me in a philosophical kind of way. (As for mere software development and such, we can get away with a lot of murder)

I'm currently looking into how we can replicate the warm, fuzzy impreciseness of human thinking through cumulative histograms over ontological expressions. I'm hoping that there is a way to create small blobs of "thinking" programs (small software programs or, probably more correctly, script languages) that can work over ontological expressions without the use of formal logic at all (first-order logic, go to hell!) that can be shared, that can learn what data can and can't be trusted to have some truthiness. Here's to hoping.

The next issue is directional linguistics, in how the vectors of knowledge is defined. There's things of importance to what order you gain your knowledge, just like there's great importance in how you sort it. This is mostly ignored, and the data is treated as it's found and entered. I'm not happy with that state of things at all, and I know that if I was taught about axioms before I got sick of math, my understanding of axiomatic value systems would be quite different. Not because I can't sit down now and figure it out, but because I've built a foundation which is hard to re-learn when wrong, hard to break free from. Any foundation sucks in that way, even our brains work this way, making it very hard to un-learn and re-train your brain. Ontological systems are no different; they build up a belief-system which may prove to be wrong further down the line, and I doubt these systems know how to deal with that, nor do the people who use such systems. I'm not happy.

Change is the key to all this, and I don't see many systems designed to cope with change. Well, small changes, for sure, but big, walloping changes? Changes in the fundamentals? Nope, not so much.

We humans can actually deal with humongous change pretty well, even though it may be a painful process to go through. Death, devastation, sickness and other large changes we adapt to. There's the saying, "when you've lost everything, there's nothing more to lose and everything to gain", and it holds remarkably true for the human adventure on this planet (look it up; the Earth is not really all that glad to have us around). But our computer systems can't deal with a CRC failure, little less a hard-drive crash just before tax-time.

There's something about the foundations of our computer systems that are terribly rigid. Now, of course, them being based on bits and bytes and hard-core logic, there's not too much you can do about the underlying stuff (apart from creating quantum machines; they're pretty awesome, and can alter the way we compute far more than the mere efficeny claims tell us) to make it more human. But we can put human genius on top of it. Heck, the ontological paradigm is one such important step in the right direction, but as long as the ontologies are defined in first-order logic and truth-statements, it is not going to work. It's going to break. It's going to suck.

Ok, enough for now. I'm heading for Canberra over the weekend, so see you on the other side, for my next ponder.


  1. Hallo Alexander,

    your job description sounds exciting. :-) As I understand it, your post gets to the basic problems of the semantic web & ontology-stuff. I don't think I get it all (especially because I haven't gone into the practical stuff very deep yet), but here are my thoughts.

    While reading your post, I thought the big problem of ontologies is very well illustrated by looking at (& simplyfying) the two hilosophies of Ludwig Wittgenstein. In the early Tractatus he reduces meaningful philosophical and scientific language to logics and says: "The world is everything that is the case", i.e. the whole of all facts (which logically is the set of all true statements). If we construct an ontology like that, as a set of true statements we get the problems you are talking about.

    So, in his later philosophy, Wittgenstein analyzes language use (in ordinary language, mathematics, logics, philosophy etc.) and concludes that all meaning is fuzzy, changing and dependent on the respective "Sprachspiel" it is used in. So language is changing but intertwined with "Lebensformen", with culture. That taken, the change is mostly unnoticeable, only in retrospect by comparing two distant states of language.

    How do we cope with this problem? Constructing "kind_of" & "can_be" predicates? (My suggestions is at the end of this commentary.)

    You write:
    "Alex is_a Person, for sure, but at some point I shall die, and then I change from is_a to a was_a. What implications will this, if any, have on things? Should is_a and was_a be synonyms, antonyms, allegoric of, or projection through?"

    I think the "was_a" is superfluous, because it is logically equivalent with the two statements "is_a" & "has_died". And I think it is useful and avoids many problems to conceive denotation atemporal, i.e. "is a person" denotes all persons in past present and future whether dead, unborn or living.

    You say:
    "Change is the key to all this, and I don't see many systems designed to cope with change."

    I think the answer to the problem of change and the related problems of ambiguity and contradiction is reification, i.e. the objectification of statements in order to make statements about them. So we can attribute an author and a date of publishing to statements, we can question their truth and challenge them by making contrary statements etc. Or we can construct ontologies inductively, by analyzing frequency, relations and other usage patterns of predicates and classes over time. Because the meaning of a word is product of its relation to other words in a language (which changes over time), ontologies give us an ideal tool for determining the meaning of a predicate, class or resource at a moment and their change over time.

    Summing my belief up that reification should be a crucial component of every "semantic web":

    Without meta-language, there is no language at all.


  2. Hi Adrian. Thanks for your insightful comments. I think you raise some interesting points, but I think I need to define better some of my intentions before we move further.

    First, though, I must confess that buried somewhere deep inside my prose was the notion that logical equivalence isn't the same as semantic equivalence, and that first-order logic in this case fails to do what we want it to do. The epistemological questions for me cannot be satisfied through formal logic. '"was_a" is superfluous' because logically is_a and a date logically can deal with it doesn't denote the cultural or linguistic semantics I'm after. Besides, in order to solve the problem of the context being highly informal I must build another formal model? That path leads to recursion of my original problem. :) I'm not after logical equivalence; I'm after knowledge.

    I also was trying to say that inference, deduction, induction and affirmation leaves out a whole great deal of what is quite common to basic philosophy, not to mention human cognition; I think that cognitive bias is extremely important to KM, and that truth-based systems fail in this. I've been quite taken with Sowa's take on using analogy as a vehicle for knowledge, but I have to go and tickle my friend Murray Altheim for more on that as he knows so much more than me on these things.

    As to the wonders of reification, then if you mean that reification is more than a feature and closer to a requisite of any sort of knowledge representation, then I would agree with you. :) I personally hate reification from a standpoint that it is a "something"; it's not a something, like a function or feature, but a fundamental thing we do and must do all the time; it is the platform upon which knowledge representation should happen. The problem of temporality (or lack thereof) in knowledge systems is quite overwhelming, I think.

    We can always define stuff; that doesn't make it real, and Wittgenstein is right in this. I especially like your noting of the problem of language in formal sciences as mostly unnoticeable, which just emphasize how tricky this part of it all is.

    As to the notion of a meta-language as a basis for any other language, well, I'm agreeing that we go meta in order to define something that makes sense of the world, but as we climb the cognitive ladder, at what point does language (as a specific thing) fade into the notion of a cognitive function?

  3. Hey Alex,

    I followed you here from NGC4LIB, hope that's ok.

    I started reading around and found this post. My philosophy chops aren't the greatest, so bear with me.

    You mentioned

    Alex is_a person
    Alex has_a son

    but does the second one really clear it up? "has_a" makes sense as a relationship between a father and a son- a father has a son. But say the father also has a wife, has a car, has a friend, has a funny idea about ontology, has a haircut, has a heart attack when he thinks about a son driving a car with a friend...? Then what? In each of those cases, the father's relationship to the person or thing he "has" is very different.

    What I'm trying to say is, what if the simplest words in English (has a, is a, etc.) do not boil down to simple concepts but are interdependent and contextual? Does that make this semantic web stuff impossible (or maybe just not worth the effort)?

    Anyway, thanks for the thoughts.

  4. Hi Joe; you can follow me here from wherever you want. :) Blogging wouldn't be the same without it.

    I agree completely that every expression isn't only contextual when it was made, but also in every future scenario. I think that also was part of my original discomfort with how ontologies are defined.

    Object, or the receiving notions when doing inferencing, must always be contextual, but because computers are logical beasts, we try to define things in a way that makes it easier (or at all possible) for them to do their bidding, which is to say, we use logic (prominently first-order logic) as our language. But I suspect the biggest problem with using logic as a construct for human knowledge is its incredible rigidity.

    How to solve this, I have no idea. But I'm sniffing John Sowa and his work for inspiration of breaking free from the shackles of induction and deduction. :) We'll see where it all leads me.