Nucleotide sequence of the gene encoding the fusion (F) glycoprotein of human respiratory syncytial virus.

Abstract
The nucleotide sequence of the mRNA encoding the F protein of respiratory syncytial (RS) virus (strain A2) was determined from cDNA [complementary DNA] clones that contain the complete mRNA sequence. The mRNA is 1899 nucleotides long exclusive of polyadenylylate. The single major open reading frame encodes a protein of 574 amino acids, with a calculated MW of 63,453. Major structural features predicted from the amino acid sequence include an NH2-terminal signal sequence (residues 1-22), hydrophobic transmembrane anchor sequence (residues 525-550), 5 potential acceptor sites for asparagine-linked carbohydrate, and a potential site (residues 131-136) for the proteolytic cleavage that generates the disulfide-linked F1 and F2 subunits, which, by analogy to other paramyxoviruses, constitute the biologically active form of the F protein. The sequence also contains an internal hydrophobic domain (residues 137-154) that, as a consequence of the activating proteolytic cleavage described above, would become the NH2 terminus of the larger, F1 subunit. The amino acid sequence of the hydrophobic terminus of the F1 subunit is known to be highly conserved among several paramyxoviruses but is markedly dissimilar for RS virus. The F2 subunit is relatively hydrophilic and contains 4 of the 5 potential carbohydrate acceptor sites. The subunit order is NH2-F2-F1-COOH. The nucleotide sequences at the 5'' and 3'' mRNA termini are conserved among the 8 RS viral mRNA sequenced to date. The conserved sequences are: .**GRAPHIC**. These are candidates to be signals for viral transcription. The nucleotide and amino acid sequences described further define the relationship between RS virus and other paramyxoviruses.