Identification of Alternate Polyadenylation Sites and Analysis of their Tissue Distribution Using EST Data
Open Access
- 16 August 2001
- journal article
- Published by Cold Spring Harbor Laboratory in Genome Research
- Vol. 11 (9), 1520-1526
- https://doi.org/10.1101/gr.190501
Abstract
Alternate polyadenylation affects a large fraction of higher eucaryote mRNAs, producing mature transcripts with 3′ ends of variable length. This variation is poorly represented in the current transcript catalogs derived from whole genome sequences, mostly because such posttranscriptional events are not detectable directly at the DNA level. Alternate polydenylation of an mRNA is better understood by comparision to EST databases. Comparing ESTs to mRNAs, however, is a difficult task subjected to the pitfalls of internal priming, presence of intron sequences, repeated elements, chimerical ESTs or matches with EST from paralogous genes. We present here a computer program that addresses these problems and displays ESTs matches to a query mRNA sequence to predict alternate polyadenylation and to suggest library-specific forms. The output highlights effective polyadenylation signals, possible sources of artifacts such as A-rich stretches in the mRNA sequences, and allows for a direct visualization of EST libraries using color codes. Statistical biases in the distribution of alternative mRNA forms among EST libraries were systematically sought. About 1450 human and 200 mouse mRNAs displayed such biases, suggesting in each case a tissue- or disease-specific regulation of polyadenylation.Keywords
This publication has 18 references indexed in Scilit:
- Heterogeneity in polyadenylation cleavage sites in mammalian mRNA sequences: implications for SAGE analysisNucleic Acids Research, 2001
- The Sequence of the Human GenomeScience, 2001
- Initial sequencing and analysis of the human genomeNature, 2001
- BodyMap incorporated PCR-based expression profiling data and a gene ranking systemNucleic Acids Research, 2001
- UTRdb and UTRsite: specialized databases of sequences and functional elements of 5' and 3' untranslated regions of eukaryotic mRNAsNucleic Acids Research, 2000
- Gapped BLAST and PSI-BLAST: a new generation of protein database search programsNucleic Acids Research, 1997
- Alternative poly(A) site selection in complex transcription units: means to an end?Nucleic Acids Research, 1997
- New opportunities for uncovering the molecular basis of cancerNature Genetics, 1997
- dbEST — database for “expressed sequence tags”Nature Genetics, 1993
- Poly(A) signalsCell, 1991