Monday, September 13, 2010

Ontology and reality

One of these days, I keep promising myself, I am going to publish something incredibly clever about chemistry, ontology and/or chemical ontology. Then again, I need some incentive to do so, and there’s none in my view. In the meantime, I am happy that somebody else has bothered to write a paper dealing with so-called “realist” approach to ontology [1].

Personally, I never cared much about the “reality” as used in context of OBO Foundry Principles [2]:
Terms in an ontology should correspond to instances in reality.
Worse still is its “corollary”:
Ontologies consist of representations of types in reality — therefore, their preferred terms should consist entirely of singular nouns.
(Why? Does “reality” really consist of singular English nouns?)

Now Lord and Stevens confirm my gut feeling that “realism” (the authors take care to clarify that “realism” in [1] stands for “realism as practiced by BFO”) applied to ontology building often results in unnecessary complexity. Everybody who ever studied physics (or English) in school would agree that expression |dr/dt| is much better definition of speed than the one provided by PATO: “A physical quality inhering in a bearer by virtue of the bearer’s rate of change of position”. To quote [1],
It makes little sense to replicate the models of physics using English instead of a more precise mathematical notation.
Alas, this is exactly what BFO (and most of OBOs) are trying to do. By going “where science has gone before” without learning the language of the science, BFO & Co. keep reinventing the square wheel.

OK, what about chemistry? Chemistry has developed its own language which makes the plain-text definitions for molecular entities redundant. The 2-D diagram (connectivity) defines the molecule of interest better than a paragraph in English. In theory, the systematic name should provide the exactly same information (and thus to be usable as a definition). However, the systematic names for even relatively small molecules often are too complicated to be widely (or ever) used.

Take the systematic name (a) for beauvericin. You are extremely unlikely to either hear it (because it is more or less unpronounceable) or see it (it takes more than one line of text, which is annoying). More importantly, there is a certain limit of molecular complexity above which the systematic names (according the existing nomenclature rules, that is) simply cannot be generated. On the other hand, the diagram (b) is both beautiful and useful.

(a)(3S,6R,9S,12R,15S,18R)-3,9,15-tribenzyl-4,10,16-trimethyl-6,12,18-tri(propan-2-yl)-1,7,13-trioxa-4,10,16-triazacyclooctadecane-2,5,8,11,14,17-hexone
(b)

Not only are the 2-D diagrams self-defining, they provide all the information needed to build the consistent ontology for molecular entities. With a few simple rules, the ontology will build itself from scratch, I promise. But this is a topic for another post.
  1. Lord, P. and Stevens, R. (2010) Adding a little reality to building ontologies for biology. PLoS ONE 5, e12258.
  2. OBO Foundry Principles.

No comments: