Abstract
Motivation: As sequenced genomes become larger and sequencing becomes faster, there is a need to develop accurate automated genome comparison techniques and databases to facilitate derivation of genome functionality; identification of enzymes, putative operons and metabolic pathways; and to derive phylogenetic classification of microbes. Results: This paper extends an automated pair-wise genome comparison technique (Bansal, Math. Model. Sci. Comput., 9, 1–23, 1998, Bansal and Bork, in First International Workshop of Declarative Languages, Springer, pp. 275–289, 1999) used to identify orthologs and gene groups to derive orthologous genes in a group of genomes and to identify genes with conserved functionality. Seventeen microbial genomes archived at ftp://ncbi.nlm.nih.gov/genbank/genomeshave been compared using the automated technique. Data related to orthologs, gene groups, gene duplication, gene fusion, orthologs with conserved functionality, and genes specifically orthologous to Escherichia coli and pathogens has been presented and analyzed. Availability: A prototype database is available at ftp://www.mcs.kent.edu/~arvind/intellibio/orthos.html. The software is free for academic research under an academic license. The detailed database for every microbial genome in NCBI is commercially available through intellibio software and consultancy corporation (Web site: http://www.mcs.kent.edu/~arvind/intellibio.html). Contact: arvind@mcs.kent.edu