Genie—Gene Finding in Drosophila melanogaster
Open Access
- 1 April 2000
- journal article
- research article
- Published by Cold Spring Harbor Laboratory in Genome Research
- Vol. 10 (4), 529-538
- https://doi.org/10.1101/gr.10.4.529
Abstract
A hidden Markov model-based gene-finding system calledGenie was applied to the genomic Adh region inDrosophila melanogaster as a part of the Genome Annotation Assessment Project (GASP). Predictions from three versions of theGenie gene-finding system were submitted, one based on statistical properties of coding genes, a second included EST alignment information, and a third that integrated protein sequence homology information. All three programs were trained on the providedDrosophila training data. In addition, promoter assignments from an integrated neural network were submitted. The gene assignments overlapped >90% of the 222 annotated genes and 26 possibly novel genes were predicted, of which some might be overpredictions. The system correctly identified the exon boundaries of 70% of the exons in cDNA-confirmed genes and 77% of the exons with the addition of EST sequence alignments. The best of the three Geniesubmissions predicted 19 of the annotated 43 gene structures entirely correct (44%). In the promoter category, only 30% of the transcription start sites could be detected, but by integrating this program as a sensor into Genie the false-positive rate could be dropped to 1/16,786 (0.006%). The results of the experiment on the long contiguous genomic sequence revealed some problems concerning gene assembly in Genie. The results were used to improve the system. We show that Genie is a robust hidden Markov model system that allows for a generalized integration of information from different sources such as signal sensors (splice sites, start codon, etc.), content sensors (exons, introns, intergenic) and alignments of mRNA, EST, and peptide sequences. The assessment showed that Genie could effectively be used for the annotation of complete genomes from higher organisms.Keywords
This publication has 11 references indexed in Scilit:
- Genome Annotation Assessment in Drosophila melanogasterGenome Research, 2000
- An Exploration of the Sequence of a 2.9-Mb Region of the Genome of Drosophila melanogaster: The Adh RegionGenetics, 1999
- Computational genefindingTrends in Biotechnology, 1998
- Prediction of complete gene structures in human genomic DNAJournal of Molecular Biology, 1997
- Integrating database homology in a probabilistic gene structure model.1997
- Improved Splice Site Detection in GenieJournal of Computational Biology, 1997
- A generalized hidden Markov model for the recognition of human genes in DNA.1996
- [27] Local alignment statisticsMethods in Enzymology, 1996
- Optimally parsing a sequence into different classes based on multiple types of evidence.1994
- Assessment of protein coding measuresNucleic Acids Research, 1992