Sequence analysis of the core gene of 14 hepatitis C virus genotypes.

Abstract
We previously sequenced the 5' noncoding region of 44 isolates of hepatitis C virus (HCV), as well as the envelope 1 (E1) gene of 51 HCV isolates, and provided evidence for the existence of at least 6 major genetic groups consisting of at least 12 minor genotypes of HCV (i.e., genotypes I/1a, II/1b, III/2a, IV/2b, 2c, V/3a, 4a-4d, 5a, and 6a). We now report the complete nucleotide sequence of the putative core (C) gene of 52 HCV isolates that represent all of these 12 genotypes as well as two additional genotypes provisionally designated 4e and 4f that we identified in this study. The phylogenetic analysis of the C gene sequences was in agreement with that of the E1 gene sequences. A major division in the genetic distance was observed between HCV isolates of genotype 2 and those of the other genotypes in analysis of both the E1 and C genes. The C gene sequences of 9 genotypes have not been reported previously (i.e., genotypes 2c, 4a-4f, 5a, and 6a). Our analysis indicates that the C gene-based methods currently used to determine the HCV genotype, such as PCR with genotype-specific primers, should be revised in light of these data. We found that the predicted C gene was exactly 573 nt long in all 52 HCV isolates, with an N-terminal start codon and no in-frame stop codons. The nucleotide and predicted amino acid identities of the C gene sequences were in the range of 79.4-99.0% and 85.3-100%, respectively. Furthermore, we mapped universally conserved, as well as genotype-specific, nucleotide and deduced amino acid sequences of the C gene. The predicted C proteins of the different HCV genotypes shared the following features: (i) high content of proline residues, (ii) high content of arginine and lysine residues located primarily in three domains with 10 such residues invariant at positions 39-62, (iii) a cluster of 5 conserved tryptophan residues, (iv) two nuclear localization signals and a DNA-binding motif, (v) a potential phosphorylation site with a serine-proline motif, and (vi) three conserved hydrophilic domains that have been shown by others to contain immunogenic epitopes. Thus, we have extended analysis of the predicted C protein of HCV to all of the recognized genotypes, confirmed the existence of highly conserved regions of this important structural protein, and demonstrated that the genetic relatedness of HCV isolates is equivalent when analyzing the most conserved (i.e., C) and the most variable (i.e., E1) genes of the HCV genome.

This publication has 21 references indexed in Scilit: