Sequence and expression of the mouse mammary tumour virus env gene.

Abstract
We have determined the DNA sequence of the envelope gene region of the GR strain of mouse mammary tumour virus. The sequence extends for 3012 nucleotides from the single EcoRI site to beyond the PstI site in the 3′ long terminal repeat (LTR) of the provirus. There is a major open reading frame from nucleotides 752 to 2818 which encompasses the entire env gene. This reading frame extends through a polypurine tract and into the LTR. There is another open reading frame from the first nucleotide to position 803, presumably corresponding to the end of the pol gene. The splice acceptor site which generates env mRNA has been mapped experimentally to nucleotide 750. The env gene products, gp52 and gp36, have been positioned on the sequence using the directly determined amino acid sequences of the amino terminus of gp52; and both the amino and carboxyl termini of gp36. The start of gp52 is preceded by a series of 19 uncharged amino acids which could function as a typical signal sequence, but this sequence is only part of a much longer leader peptide. The tetrad Arg‐Ala‐Lys‐Arg is the presumed cleavage site in the gPr73env precursor, and occurs just before the gp36 amino terminus. There are five potential asparagine‐linked glycosylation sites which agrees with previous experimental results. The gp36 has two long hydrophobic regions at its amino and carboxy termini, these are suggested to act as a fusion peptide and the trans‐membrane anchor, respectively.