Predicting mammalian hosts in which novel coronaviruses can be generated

Abstract
Novel pathogenic coronaviruses – such as SARS-CoV and probably SARS-CoV-2 – arise by homologous recombination between co-infecting viruses in a single cell. Identifying possible sources of novel coronaviruses therefore requires identifying hosts of multiple coronaviruses; however, most coronavirus-host interactions remain unknown. Here, by deploying a meta-ensemble of similarity learners from three complementary perspectives (viral, mammalian and network), we predict which mammals are hosts of multiple coronaviruses. We predict that there are 11.5-fold more coronavirus-host associations, over 30-fold more potential SARS-CoV-2 recombination hosts, and over 40-fold more host species with four or more different subgenera of coronaviruses than have been observed to date at >0.5 mean probability cut-off (2.4-, 4.25- and 9-fold, respectively, at >0.9821). Our results demonstrate the large underappreciation of the potential scale of novel coronavirus generation in wild and domesticated animals. We identify high-risk species for coronavirus surveillance.
Funding Information
  • RCUK | Medical Research Council (MR/R024898/1)
  • RCUK | Biotechnology and Biological Sciences Research Council (MR/R024898/1, BB/K003798/1)
  • RCUK | Natural Environment Research Council (NE/G002827/1)