Development and Evaluation of an Automated Annotation Pipeline and cDNA Annotation System
Open Access
- 2 June 2003
- journal article
- research article
- Published by Cold Spring Harbor Laboratory in Genome Research
- Vol. 13 (6b), 1542-1551
- https://doi.org/10.1101/gr.992803
Abstract
Manual curation has long been held to be the “gold standard” for functional annotation of DNA sequence. Our experience with the annotation of more than 20,000 full-length cDNA sequences revealed problems with this approach, including inaccurate and inconsistent assignment of gene names, as well as many good assignments that were difficult to reproduce using only computational methods. For the FANTOM2 annotation of more than 60,000 cDNA clones, we developed a number of methods and tools to circumvent some of these problems, including an automated annotation pipeline that provides high-quality preliminary annotation for each sequence by introducing an “uninformative filter” that eliminates uninformative annotations, controlled vocabularies to accurately reflect both the functional assignments and the evidence supporting them, and a highly refined, Web-based manual annotation tool that allows users to view a wide array of sequence analyses and to assign gene names and putative functions using a consistent nomenclature. The ultimate utility of our approach is reflected in the low rate of reassignment of automated assignments by manual curation. Based on these results, we propose a new standard for large-scale annotation, in which the initial automated annotations are manually investigated and then computational methods are iteratively modified and improved based on the results of manual curation.Keywords
This publication has 21 references indexed in Scilit:
- Systematic Expression Profiling of the Mouse Transcriptome Using RIKEN cDNA MicroarraysGenome Research, 2003
- Analysis of the mouse transcriptome based on functional annotation of 60,770 full-length cDNAsNature, 2002
- Exploration of Novel Motifs Derived from Mouse cDNA SequencesGenome Research, 2002
- Assignment of homology to genome sequences using a library of hidden Markov models that represent all proteins of known structureJournal of Molecular Biology, 2001
- The Genome Sequence of Drosophila melanogasterScience, 2000
- Comparison of DNA Sequences with Protein SequencesGenomics, 1997
- ESTablishing a human transcript mapNature Genetics, 1995
- dbEST — database for “expressed sequence tags”Nature Genetics, 1993
- Basic Local Alignment Search ToolJournal of Molecular Biology, 1990
- Basic local alignment search toolJournal of Molecular Biology, 1990