Abstract
Regulation of gene expression at the post-transcriptional level is mainly achieved by proteins containing well-defined sequence motifs involved in RNA binding. The most widely spread motifs are the RNA recognition motif (RRM) and the K homology (KH) domain. In this article, we survey the complete Arabidopsis thaliana genome for proteins containing RRM and KH RNA-binding domains. The Arabidopsis genome encodes 196 RRM-containing proteins, a more complex set than found in Caenorhabditis elegans and Drosophila melanogaster. In addition, the Arabidopsis genome contains 26 KH domain proteins. Most of the Arabidopsis RRM-containing proteins can be classified into structural and/or functional groups, based on similarity with either known metazoan or Arabidopsis proteins. Approximately 50% of Arabidopsis RRM-containing proteins do not have obvious homologues in metazoa, and for most of those that are predicted to be orthologues of metazoan proteins, no experimental data exist to confirm this. Additionally, the function of most Arabidopsis RRM proteins and of all KH proteins is unknown. Based on the data presented here, it is evident that among all eukaryotes, only those RNA-binding proteins that are involved in the most essential processes of post-transcriptional gene regulation are preserved in structure and, most probably, in function. However, the higher complexity of RNA-binding proteins in Arabidopsis, as evident in groups of SR splicing factors and poly(A)-binding proteins, may account for the observed differences in mRNA maturation between plants and metazoa. This survey provides a first systematic analysis of plant RNA-binding proteins, which may serve as a basis for functional characterisation of this important protein group in plants.