U2AF binding selects for the high conservation of the C. elegans 3′ splice site

Abstract
Caenorhabditis elegans is unusual among animals in having a highly conserved octamer sequence at the 3′ splice site: UUUU CAG/R. This sequence can bind to the essential heterodimeric splicing factor U2AF, with U2AF65 contacting the U tract and U2AF35 contacting the splice site itself (AG/R). Here we demonstrate a strong correspondence between binding to U2AF of RNA oligonucleotides with variant octamer sequences and the frequency with which such variations occur in splice sites. C. elegans U2AF has a strong preference for the octamer sequence and exerts much of the pressure for 3′ splice sites to have the precise UUUUCAG/R sequence. At two positions the splice site has a very strong preference for U even though alternative bases can also bind tightly to U2AF, suggesting that evolution can select against sequences that may have a relatively modest reduction in binding. Although pyrimidines are frequently present at the first base in the exon, U2AF has a very strong bias against them, arguing there is a mechanism to compensate for weakened U2AF binding at this position. Finally, the C in the consensus sequence must remain adjacent to the AG/R rather than to the stretch of U’s, suggesting this C is recognized by U2AF35.