Saturday, June 13, 2009

Clouds of ontology, Part 1

The last few years I have been encountering expressions containing the word "ontology" in the context of natural sciences, computer science and the internet - "medical ontology", "web ontology". For some reasons or other I feel deeply suspicious about all this, without yet knowing zilch about it. I suppose one reason is that it appears to involve a lot of freshly-minted software, fancy terminology, amiably intelligent hotshots and research loads-a-money. After 25 years in IT, I am fed up to the teeth with that - except for the last item, natch.

I should just mention that I find "ontology" and "epistemology" to be fairly useless words. They made serious sense only in the context of the dogmatically dualistic world-view sometimes called "Cartesian". They have appeared in various philosophical guises in the past, for instance in connection with the similarities and differences between "Venus" and "the morning/evening star". Supposedly someone asked Tolstoi what the difference was between governmental violence and revolutionary violence. He replied, "the difference between cat shit and dog shit". Of course that may be somewhat unfair. I'm sure it makes a difference to upholstery cleaners.

Anyway, I have put on my red riding hood, grabbed my basket, and gone out zilch-collecting in Entity Forest. So far, I have found a few possibly digestible items, and only two toadstools - but these two are quite toxic. I will describe them in a subsequent post. But now, a word about our sponsor: "ontology". (Actually not yet a sponsor, but I'm going to apply for a grant).

One of the success stories in the ontology business appears to be Barry Smith. I reported slightly sarcastically on his having won big money 10 years ago. He is professor of philosophy at the University of Buffalo, and editor of the revived Monist. I once watched some of the videos in a training course he did called "Introduction to Biomedical Ontologies". They're extremely interesting and well-presented, in contrast with other stuff I've found on the internet. From Smith's course, a picture is emerging for me of serious and useful work being done at least in medicine and genetic science. But attendant on this are gigantic clouds of brow-knitted philosophastering - as has ever been the case in IT, where those involved are called consultants.

The guy credited with popularizing the term "ontology" in its new sense in 1992, beyond the confines of AI, is Tom Gruber. He originally wrote: "An ontology is a specification of a conceptualization". Hmmm. But that was only the short answer. He goes on:

What is an Ontology?

Short answer:
An ontology is a specification of a conceptualization.
The word "ontology" seems to generate a lot of controversy in discussions about AI. It has a long history in philosophy, in which it refers to the subject of existence. It is also often confused with epistemology, which is about knowledge and knowing.

In the context of knowledge sharing, I use the term ontology to mean a specification of a conceptualization. That is, an ontology is a description (like a formal specification of a program) of the concepts and relationships that can exist for an agent or a community of agents. This definition is consistent with the usage of ontology as set-of-concept-definitions, but more general. And it is certainly a different sense of the word than its use in philosophy.

What is important is what an ontology is for. My colleagues and I have been designing ontologies for the purpose of enabling knowledge sharing and reuse. In that context, an ontology is a specification used for making ontological commitments. The formal definition of ontological commitment is given below. For pragmetic reasons, we choose to write an ontology as a set of definitions of formal vocabulary. Although this isn't the only way to specify a conceptualization, it has some nice properties for knowledge sharing among AI software (e.g., semantics independent of reader and context). Practically, an ontological commitment is an agreement to use a vocabulary (i.e., ask queries and make assertions) in a way that is consistent (but not complete) with respect to the theory specified by an ontology. We build agents that commit to ontologies. We design ontologies so we can share knowledge with and among these agents.
"Semantics independent of reader and context": that's a nice one! In his philosophical reading, Gruber possibly didn't get as far as Gadamer and Luhmann, to name but two. I bet even old Schleiermacher would have dropped his veil in shock. It's also a tiny bit inconsistent to say ontology is "often confused with epistemology, which is about knowledge and knowing", and yet claim that his ontologies are "for the purpose of enabling knowledge sharing and reuse". If an ontology is a prerequisite for knowledge, where does knowledge of the ontology come from? Turning and turning in a widening gyre ...

Why does Gruber cling to the word ontology, when he says that "it is certainly a different sense of the word than its use in philosophy", and he would in fact be better served by the term "epistemology" if he's concerned with knowledge? Because it sounds more down-to-earth, that's why. The earth is real, you see. On the published evidence here, Gruber is another Realist wrapped in the cloak of Formalism, trying to gate-crash the Groves of Episteme. "An ontological commitment is an agreement to use a vocabulary (i.e., ask queries and make assertions) in a way that is consistent (but not complete) with respect to the theory specified by an ontology." Tee-hee!

16 years on, Gruber has another definition, in which he writes that some people have said that "computational ontology [is] a kind of applied philosophy". Here, as often, philosophy seems to mean the thinking of deep thoughts, not familiarity with actual philosophers. Gruber does not appear to disagree with that view:
Definition

In the context of computer and information sciences, an ontology defines a set of representational primitives with which to model a domain of knowledge or discourse. The representational primitives are typically classes (or sets), attributes (or properties), and relationships (or relations among class members). The definitions of the representational primitives include information about their meaning and constraints on their logically consistent application. In the context of database systems, ontology can be viewed as a level of abstraction of data models, analogous to hierarchical and relational models, but intended for modeling knowledge about individuals, their attributes, and their relationships to other individuals.
....
Historical Background

The term "ontology" comes from the field of philosophy that is concerned with the study of being or existence. In philosophy, one can talk about an ontology as a theory of the nature of existence (e.g., Aristotle's ontology offers primitive categories, such as substance and quality, which were presumed to account for All That Is). In computer and information science, ontology is a technical term denoting an artifact that is designed for a purpose, which is to enable the modeling of knowledge about some domain, real or imagined.

The term had been adopted by early Artificial Intelligence (AI) researchers ... Some researchers, drawing inspiration from philosophical ontologies, viewed computational ontology as a kind of applied philosophy [10].


Thursday, June 11, 2009

Rapped on the accents

Many search algorithms that I have used in the internet and locally installed software are lenient about punctuation and diacritical parsley such as accent aigu and Umlaut. As I remember, Google used to silently "normalize" in the background (ö and oe giving approximately the same results for German words), but it doesn't do that anymore.

The Diccionario de la lengua Española, maintained by the Real Academia Española, is unforgiving. Rereading Nerval's El desdichado, I wanted to know what the Academia had to say about desdichado. There is a "coloq." subentry saying sin malicia, pusilánime. Since the English congenerics have very distinct meanings, I wanted to check the Spanish. I entered pusilanime without bothering about the accent, and got this:
La palabra pusilanime no está registrada en el Diccionario. Las que se muestran a continuación tienen una escritura cercana.

* pusilánime
Since there was only one entry "in the vicinity" of what I entered, I thought: the software might just as well have opened to that result. But no, it knuckles me to acknowledge that "á" is not "a" in Spanish orthography. This is not a case where I would say (formulating as a native English speaker) that "the accent makes a difference" in meaning, apart from indicating the syllable to be stressed . But then si (if) and (yes) are completely different words.

On the whole, I should just take the rap and purge the "diacritical parsley" idea from my head. It makes life easier.

Wednesday, June 10, 2009

French antiquity

I just discovered that modern French is more than 100,000 years old, on good authority. Let me explain how I found this out.

languagehat's recent post That darn gene again is about sensationalist claims by some people that a certain FOXP2 "gene" is "responsible" for language, and even grammar. The post links to a Language Log article from 2005 by Geoff Pullum, who puts these claims in their place - the dustbin of media malarkey. Pullum quotes from a useful 2003 survey by Alec MacAndrew entitled FOXP2 and the Evolution of Language . MacAndrew says this:
No-one should imagine that the development of language relied exclusively on a single mutation in FOXP2. They are many other changes that enable speech. Not least of these are profound anatomical changes that make the human supralarygeal pathway entirely different from any other mammal. The larynx has descended so that it provides a resonant column for speech (but, as an unfortunate side-effect, predisposes humans to choking on food). Also, the nasal cavity can be closed thus preventing vowels from being nasalised and thus increasing their comprehensibility. These changes cannot have happened over such a short period as 100,000 years.
The 100,000 figure comes from this:
... by looking at silent polymorphisms in the gene, Enard et al estimate that the mutations in the FOXP2 in the human lineage occurred between 10,000 and 100,000 years ago
Having been polishing my French like a madman over the last few years, I find these observations helpful in understanding why my ability to understand spoken French, apart from that of intellectuals, still does not shine as it should. I conclude that the incomprehensible, nasalizing quality of French vowels must have been established before larynges descended, and noses closed, to enable the bell-like* clarity of West Texas English.

*Think Big Ben rather than Tinkerbell.