Comprehensive assessment of computational algorithms in predicting cancer driver mutations

Open Access

20 February 2020

journal article
research article
Published by Springer Science and Business Media LLC in Genome Biology

Vol. 21 (1), 1-17
https://doi.org/10.1186/s13059-020-01954-z

Abstract

The initiation and subsequent evolution of cancer are largely driven by a relatively small number of somatic mutations with critical functional impacts, so-called driver mutations. Identifying driver mutations in a patient’s tumor cells is a central task in the era of precision cancer medicine. Over the decade, many computational algorithms have been developed to predict the effects of missense single-nucleotide variants, and they are frequently employed to prioritize mutation candidates. These algorithms employ diverse molecular features to build predictive models, and while some algorithms are cancer-specific, others are not. However, the relative performance of these algorithms has not been rigorously assessed. We construct five complementary benchmark datasets: mutation clustering patterns in the protein 3D structures, literature annotation based on OncoKB, TP53 mutations based on their effects on target-gene transactivation, effects of cancer mutations on tumor formation in xenograft experiments, and functional annotation based on in vitro cell viability assays we developed including a new dataset of ~ 200 mutations. We evaluate the performance of 33 algorithms and found that CHASM, CTAT-cancer, DEOGEN2, and PrimateAI show consistently better performance than the other algorithms. Moreover, cancer-specific algorithms show much better performance than those designed for a general purpose. Our study is a comprehensive assessment of the performance of different algorithms in predicting cancer driver mutations and provides deep insights into the best practice of computationally prioritizing cancer mutation candidates for end-users and for the future development of new algorithms.

Keywords

Funding Information

NCI (CA209851)

This publication has 46 references indexed in Scilit:

The Cancer Genome Atlas Pan-Cancer analysis project
Nature Genetics, 2013
Predicting the functional consequences of cancer-associated amino acid substitutions
Bioinformatics, 2013
CRAVAT: cancer-related analysis of variants toolkit
Bioinformatics, 2013
Predicting the Functional, Molecular, and Phenotypic Consequences of Amino Acid Substitutions using Hidden Markov Models
Human Mutation, 2012
Predicting the functional impact of protein mutations: application to cancer genomics
Nucleic Acids Research, 2011
dbNSFP: A lightweight database of human nonsynonymous SNPs and their functional predictions
Human Mutation, 2011
International network of cancer genome projects
Nature, 2010
Automated inference of molecular mechanisms of disease from amino acid substitutions
Bioinformatics, 2009
Identification of deleterious mutations within three human genomes
Genome Research, 2009
Somatic mutations affect key pathways in lung adenocarcinoma
Nature, 2008

Cited by 49 articles