Unexpected sequence diversity in the amino-terminal ends of the coat proteins of strains of sugarcane mosaic virus

Abstract
The sequence of the 3′-terminal 1343 nucleotides of the SC strain of the sugarcane mosaic virus (SCMV-SC) genome was compared with the 1376 nucleotides at the 3′ terminus of maize dwarf mosaic virus B (MDMV-B). The SCMV-SC sequence includes an open reading frame which codes for the viral coat protein of 313 amino acids (nucleotides 157 to 1116), followed by a 3′ non-coding region of 235 nucleotides and a poly(A) tail. The MDMV-B sequence codes for the capsid protein (nucleotides 157 to 1139) of 328 amino acids and has a 3′ non-coding region of 236 nucleotides. The coat protein of SCMV-SC has 92% identity with that of MDMV-B except for the region between amino acid residues 27 and 70 of SCMV-SC. This region of SCMV-SC is smaller (44 residues) than the equivalent region in MDMV-B (59 residues) and has only 22% identity with the MDMV-B sequence. Possible mechanisms for the generation of this sequence diversity are discussed. Despite this diversity, the sequence identities of both the major part of the coat proteins and the 3′ non-coding regions confirm the proposal, based on previously described serological data, that SCMV-SC and MDMV-B are strains of SCMV.