Molecular characterization of the S protein gene of human coronavirus OC43

Abstract
The gene encoding the spike protein of the OC43 strain of human coronavirus (HCV-OC43) was cloned and sequenced. The complete nucleotide sequence revealed an open reading frame of 4062 nucleotides encoding a protein of 1353 amino acids with a predicted M(r) of 150,078. Structural features include 22 N-glycosylation sites, an N-terminal hydrophobic signal sequence of 17 amino acids, an hydrophilic cysteine-rich sequence of 35 amino acids near the C terminus, and a potential proteolytic cleavage site (RRSR) between amino acid residues 758 and 759, yielding S1 and S2 segments of 84,730 and 65,366 M(r), respectively. The predicted amino acid sequence of the spike protein of HCV-OC43 has 91% identity with that of the Mebus strain of bovine coronavirus, revealing more sequence divergence in the putative bulbous part (S1) than in the predicted stem region (S2).