Epigenome-based splicing prediction using a recurrent neural network
Open Access
- 1 June 2020
- journal article
- research article
- Published by Public Library of Science (PLoS) in PLoS Computational Biology
- Vol. 16 (6), e1008006
- https://doi.org/10.1371/journal.pcbi.1008006
Abstract
Alternative RNA splicing provides an important means to expand metazoan transcriptome diversity. Contrary to what was accepted previously, splicing is now thought to predominantly take place during transcription. Motivated by emerging data showing the physical proximity of the spliceosome to Pol II, we surveyed the effect of epigenetic context on co-transcriptional splicing. In particular, we observed that splicing factors were not necessarily enriched at exon junctions and that most epigenetic signatures had a distinctly asymmetric profile around known splice sites. Given this, we tried to build an interpretable model that mimics the physical layout of splicing regulation where the chromatin context progressively changes as the Pol II moves along the guide DNA. We used a recurrent-neural-network architecture to predict the inclusion of a spliced exon based on adjacent epigenetic signals, and we showed that distinct spatio-temporal features of these signals were key determinants of model outcome, in addition to the actual nucleotide sequence of the guide DNA strand. After the model had been trained and tested (with > 80% precision-recall curve metric), we explored the derived weights of the latent factors, finding they highlight the importance of the asymmetric time-direction of chromatin context during transcription. Author summary In humans, only about 2% of the genome is comprised of so-called coding regions and can give rise to protein products. However, the human transcriptome is much more diverse than the number of genes found in these coding regions. Each gene can give rise to multiple transcripts through a process during transcription called alternative splicing. There is a limited understanding of the regulation of splicing and the underlying splicing code that determines cell-type-specific splicing. Here, we studied epigenetic features that characterize splicing regulation in humans using a recurrent neural network model. Unlike feedforward neural networks, this method contains an internal memory state that learns from spatiotemporal patterns-like the context in language-from a sequence of genomic and epigenetic information, making it better suited for characterizing splicing. We demonstrated that our method improves the prediction of spicing outcomes compared to previous methods. Furthermore, we applied our method to 49 cell types in ENCODE to investigate splicing regulation and found that not only spatial but also temporal epigenomic context can influence splicing regulation during transcription.Keywords
Funding Information
- AL Williams Professorship Funds (AL Williams Professorship Funds)
This publication has 65 references indexed in Scilit:
- Landscape of transcription in human cellsNature, 2012
- Cwc2 and its human homologue RBM22 promote an active conformation of the spliceosome catalytic centreThe EMBO Journal, 2012
- Understanding splicing regulation through RNA splicing mapsTrends in Genetics, 2011
- Reciprocal intronic and exonic histone modification regions in humansNature Structural & Molecular Biology, 2010
- Context-Dependent Regulatory Mechanism of the Splicing Factor hnRNP LMolecular Cell, 2010
- A wave of nascent transcription on activated human genesProceedings of the National Academy of Sciences, 2009
- Biased Chromatin Signatures around Polyadenylation Sites and ExonsMolecular Cell, 2009
- Differential chromatin marking of introns and expressed exons by H3K36me3Nature Genetics, 2009
- Alternative isoform regulation in human tissue transcriptomesNature, 2008
- Listening to silence and understanding nonsense: exonic mutations that affect splicingNature Reviews Genetics, 2002