Comprehensive analysis of transcriptional promoter structure and function in 1% of the human genome

Abstract
Transcriptional promoters comprise one of many classes of eukaryotic transcriptional regulatory elements. Identification and characterization of these elements are vital to understanding the complex network of human gene regulation. Using full-length cDNA sequences to identify transcription start sites (TSS), we predicted more than 900 putative human transcriptional promoters in the ENCODE regions, representing a comprehensive sampling of promoters in 1% of the genome. We identified 387 fragments that function as promoters in at least one of 16 cell lines by measuring promoter activity in high-throughput transient transfection reporter assays. These positive functional results demonstrate widespread use of alternative promoters. We show a strong correlation between promoter activity and the corresponding endogenous RNA transcript levels, providing the first experimental quantitative estimate of promoter contribution to gene regulation. Finally, we identified functional regions within a randomly selected subset of 45 promoters using deletion analyses. These experiments showed that, on average, the sequence -300 to -50 bp of the TSS positively contributes to core promoter activity. Interestingly, putative negative elements were identified -1000 to -500 bp upstream of the TSS for 55% of genes tested. These data provide the largest and most comprehensive view of promoter function in the human genome.