Transcriptome genetics using second generation sequencing in a Caucasian population

Abstract
There is currently much interest in the understanding of genetic mechanisms that underlie variation at the gene expression level. Two groups reporting in this issue of Nature use RNA sequencing to study global gene expression in two contrasting populations. Pickrell et al. sequenced RNA from 69 lymphoblastoid cell lines derived from unrelated Nigerian individuals who have been extensively genotyped as part of the HapMap Project. By pooling data from all the individuals it was possible to identify many genetic determinants of variation in gene expression. Montgomery et al. characterize the mRNA fraction of RNA isolated from lymphoblastoid cell lines derived from 63 HapMap individuals of Caucasian origin. They obtain a fine-scale view of the transcriptome and identify genetic variants that affect alternative splicing. Here, sequencing has been used to characterize the mRNA fraction of the transcriptome in Caucasian individuals, to provide a fine-scale view of transcriptomes and to identify genetic variants that affect alternative splicing. Measuring allele-specific expression identified rare expression quantitative trait loci (eQTLs) and allelic differences in transcript structure, revealing new properties of genetic effects on the transcriptome. Gene expression is an important phenotype that informs about genetic and environmental effects on cellular state. Many studies have previously identified genetic variants for gene expression phenotypes using custom and commercially available microarrays1,2,3,4,5. Second generation sequencing technologies are now providing unprecedented access to the fine structure of the transcriptome6,7,8,9,10,11,12,13,14. We have sequenced the mRNA fraction of the transcriptome in 60 extended HapMap individuals of European descent and have combined these data with genetic variants from the HapMap3 project15. We have quantified exon abundance based on read depth and have also developed methods to quantify whole transcript abundance. We have found that approximately 10 million reads of sequencing can provide access to the same dynamic range as arrays with better quantification of alternative and highly abundant transcripts. Correlation with SNPs (small nucleotide polymorphisms) leads to a larger discovery of eQTLs (expression quantitative trait loci) than with arrays. We also detect a substantial number of variants that influence the structure of mature transcripts indicating variants responsible for alternative splicing. Finally, measures of allele-specific expression allowed the identification of rare eQTLs and allelic differences in transcript structure. This analysis shows that high throughput sequencing technologies reveal new properties of genetic effects on the transcriptome and allow the exploration of genetic effects in cellular processes.