The mutational spectrum of single base-pair substitutions in mRNA splice junctions of human genes: Causes and consequences

Abstract
Summary A total of 101 different examples of point mutations, which lie in the vicinity of mRNA splice junctions, and which have been held to be responsible for a human genetic disease by altering the accuracy of efficiency of mRNA splicing, have been collated. These data comprise 62 mutations at 5′ splice sites, 26 at 3′ splice sites and 13 that result in the creation of novel splice sites. It is estimated that up to 15% of all point mutations causing human genetic disease result in an mRNA splicing defect. Of the 5′ splice site mutations, 60% involved the invariant GT dinucleotide; mutations were found to be non-randomly distributed with an excess over expectation at positions +1 and +2, and apparent deficiencies at positions −1 and −2. Of the 3′ splice site mutations, 87% involved the invariant AG dinucleotide; an excess of mutations over expectation was noted at position -2. This non-randomness of mutation reflects the evolutionary conservation apparent in splice site consensus sequences drawn up previously from primate genes, and is most probably attributable to detection bias resulting from the differing phenotypic severity of specific lesions. The spectrum of point mutations was also drastically skewed: purines were significantly overrepresented as substituting nucleotides, perhaps because of steric hindrance (e.g. in U1 snRNA binding at 5′ splice sites). Furthermore, splice sites affected by point mutations resulting in human genetic disease were markedly different from the splice site consensus sequences. When similarity was quantified by a ‘consensus value’, both extremely low and extremely high values were notably absent from the wild-type sequences of the mutated splice sites. Splice sites of intermediate similarity to the consensus sequence may thus be more prone to the deleterious effects of mutation. Regarding the phenotypic effects of mutations on mRNA splicing, exon skipping occurred more frequently than cryptic splice site usage. Evidence is presented that indicates that, at least for 5′ splice site mutations, cryptic splice site usage is favoured under conditions where (1) a number of such sites are present in the immediate vicinity and (2) these sites exhibit sufficient homology to the splice site consensus sequence for them to be able to compete successfully with the mutated splice site. The novel concept of a “potential for cryptic splice site usage” value was introduced in order to quantify these characteristics, and to predict the relative proportion of exon skipping vs cryptic splice site utilization consequent to the introduction of a mutation at a normal splice site.