Use of Nucleotide Composition Analysis To Infer Hosts for Three Novel Picorna-Like Viruses

Abstract
Nearly complete genome sequences of three novel RNA viruses were acquired from the stool of an Afghan child. Phylogenetic analysis indicated that these viruses belong to the picorna-like virus superfamily. Because of their unique genomic organization and deep phylogenetic roots, we propose these viruses, provisionally named calhevirus, tetnovirus-1, and tetnovirus-2, as prototypes of new viral families. A newly developed nucleotide composition analysis (NCA) method was used to compare mononucleotide and dinucleotide frequencies for RNA viruses infecting mammals, plants, or insects. Using a large training data set of 284 representative picornavirus-like genomic sequences with defined host origins, NCA correctly identified the kingdom or phylum of the viral host for >95% of picorna-like viruses. NCA predicted an insect host origin for the 3 novel picorna-like viruses. Their presence in human stool therefore likely reflects ingestion of insect-contaminated food. As metagenomic analyses of different environments and organisms continue to yield highly divergent viral genomes NCA provides a rapid and robust method to identify their likely cellular hosts.