Computational Detection and Location of Transcription Start Sites in Mammalian Genomic DNA
Open Access
- 1 March 2002
- journal article
- Published by Cold Spring Harbor Laboratory in Genome Research
- Vol. 12 (3), 458-461
- https://doi.org/10.1101/gr.216102
Abstract
Transcription, the process whereby RNA copies are made from sections of the DNA genome, is directed by promoter regions. These define the transcription start site, and also the set of cellular conditions under which the promoter is active. At least in more complex species, it appears to be common for genes to have several different transcription start sites, which may be active under different conditions. Eukaryotic promoters are complex and fairly diffuse structures, which have proven hard to detect in silico. We show that a novel hybrid machine-learning method is able to build useful models of promoters for >50% of human transcription start sites. We estimate specificity to be >70%, and demonstrate good positional accuracy. Based on the structure of our learned models, we conclude that a signal resembling the well known TATA box, together with flanking regions of C-G enrichment, are the most important sequence-based signals marking sites of transcriptional initiation at a large class of typical promoters.Keywords
This publication has 14 references indexed in Scilit:
- SSAHA: A Fast Search Method for Large DNA DatabasesGenome Research, 2001
- First Pass Annotation of Promoters on Human Chromosome 22Genome Research, 2001
- Functional annotation of a full-length mouse cDNA collectionNature, 2001
- The hormone-sensitive lipase gene is transcribed from at least five alternative first exons in mouse adipose tissueMammalian Genome, 2000
- Highly specific localization of promoter regions in large genomic sequences by PromoterInspector: a novel context analysis approachJournal of Molecular Biology, 2000
- The Eukaryotic Promoter Database (EPD)Nucleic Acids Research, 2000
- Indentification and functional modelling of DNA sequence elements of transcriptionBriefings in Bioinformatics, 2000
- The DNA sequence of human chromosome 22Nature, 1999
- Detection of eukaryotic promoters using Markov transition matricesComputers & Chemistry, 1997
- Weight matrix descriptions of four eukaryotic RNA polymerase II promoter elements derived from 502 unrelated promoter sequencesJournal of Molecular Biology, 1990