SEMANTIC SIMILARITY MEASURES AS TOOLS FOR EXPLORING THE GENE ONTOLOGY

Abstract
Many bioinformatics resources hold data in the form of sequences. Often this sequence data is associated with a large amount of annotation. In many cases this data has been hard to model, and has been represented as scientific natural language, which is not readily computationally amenable. The development of the Gene Ontology provides us with a more accessible representation of some of this data. However it is not clear how this data can best be searched, or queried. Recently we have adapted information content based measures for use with the Gene Ontology (GO). In this paper we present detailed investigation of the properties of these measures, and examine various properties of GO, which may have implications for its future design.