Massively parallel sequencing of the polyadenylated transcriptome of C. elegans

Abstract
Using massively parallel sequencing by synthesis methods, we have surveyed the polyA+ transcripts from four stages of the nematode Caenorhabditis elegans to an unprecedented depth. Using novel statistical approaches, we evaluated the coverage of annotated features of the genome and of candidate processed transcripts, including splice junctions, trans-spliced leader sequences, and polyadenylation tracts. The data provide experimental support for >85% of the annotated protein-coding transcripts in WormBase (WS170) and confirm additional details of processing. For example, the total number of confirmed splice junctions was raised from 70,911 to over 98,000. The data also suggest thousands of modifications to WormBase annotations and identify new spliced junctions and genes not part of any WormBase annotation, including at least 80 putative genes not found in any of three predicted gene sets. The quantitative nature of the data also suggests that mRNA levels may be measured by this approach with unparalleled precision. Although most sequences align with protein-coding genes, a small fraction falls in introns and intergenic regions. One notable region on the X chromosome encodes a noncoding transcript of >10 kb localized to somatic nuclei.