Evaluation of Gene-Finding Programs on Mammalian Sequences

Open Access

1 May 2001

journal article
research article
Published by Cold Spring Harbor Laboratory in Genome Research

Vol. 11 (5), 817-832
https://doi.org/10.1101/gr.147901

Abstract

We present an independent comparative analysis of seven recently developed gene-finding programs: FGENES,GeneMark.hmm, Genie, Genscan,HMMgene, Morgan, and MZEF. For evaluation purposes we developed a new, thoroughly filtered, and biologically validated dataset of mammalian genomic sequences that does not overlap with the training sets of the programs analyzed. Our analysis shows that the new generation of programs has substantially better results than the programs analyzed in previous studies. The accuracy of the programs was also examined as a function of various sequence and prediction features, such as G + C content of the sequence, length and type of exons, signal type, and score of the exon prediction. This approach pinpoints the strengths and weaknesses of each individual program as well as those of computational gene-finding in general. The dataset used in this analysis (HMR195) as well as the tables with the complete results are available athttp://www.cs.ubc.ca/∼rogic/evaluation/.

Keywords

This publication has 42 references indexed in Scilit:

A Decision Tree System for Finding Genes in DNA
Journal of Computational Biology, 1998
Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
Nucleic Acids Research, 1997
Prediction of complete gene structures in human genomic DNA
Journal of Molecular Biology, 1997
Distinctive Sequence Features in Protein Coding Genic Non-coding, and Intergenic Human DNA
Journal of Molecular Biology, 1995
Identification of Protein Coding Regions In Genomic DNA
Journal of Molecular Biology, 1995
Gene Structure Prediction by Linguistic Methods
Genomics, 1994
Prediction of gene structure
Journal of Molecular Biology, 1992
Basic Local Alignment Search Tool
Journal of Molecular Biology, 1990
Basic local alignment search tool
Journal of Molecular Biology, 1990
A tutorial on hidden Markov models and selected applications in speech recognition
Proceedings of the IEEE, 1989

Cited by 212 articles