Identification and Functional Analysis of Human Transcriptional Promoters

Abstract
Genomic and full-length cDNA sequences provide opportunities for understanding human gene structure and transcriptional regulatory elements. The simplest regulatory elements to identify are promoters, as their positions are dictated by the location of transcription start sites. We aligned full-length cDNA clones from the Mammalian Gene Collection to the human genome rough draft sequence to estimate the start sites of more than 10,000 human transcripts. We selected genomic sequence just upstream from the 5′ end of these cDNA sequences and designated these as putative promoters. We assayed the functions of 152 of these DNA fragments, chosen at random from the entire set, in a luciferase-based transfection assay in four human cultured cell types. Ninety-one percent of these DNA fragments showed significant transcriptional activity in at least one of the cell lines, whereas 89% showed activity in at least two of the lines. We analyzed the distributions of strengths of these promoter fragments in the different cell types and identified likely alternative promoters in a large fraction of the genes. These data indicate that this approach is an effective method for predicting human promoters and provide the first set of functional data collected in parallel for a large set of human promoters.[Supplemental material is available online atwww.genome.org and http://www-shgc.stanford.edu/myerslab/.]