Comprehensive assessment of computational algorithms in predicting cancer driver mutations
Open Access
- 20 February 2020
- journal article
- research article
- Published by Springer Science and Business Media LLC in Genome Biology
- Vol. 21 (1), 1-17
- https://doi.org/10.1186/s13059-020-01954-z
Abstract
The initiation and subsequent evolution of cancer are largely driven by a relatively small number of somatic mutations with critical functional impacts, so-called driver mutations. Identifying driver mutations in a patient’s tumor cells is a central task in the era of precision cancer medicine. Over the decade, many computational algorithms have been developed to predict the effects of missense single-nucleotide variants, and they are frequently employed to prioritize mutation candidates. These algorithms employ diverse molecular features to build predictive models, and while some algorithms are cancer-specific, others are not. However, the relative performance of these algorithms has not been rigorously assessed. We construct five complementary benchmark datasets: mutation clustering patterns in the protein 3D structures, literature annotation based on OncoKB, TP53 mutations based on their effects on target-gene transactivation, effects of cancer mutations on tumor formation in xenograft experiments, and functional annotation based on in vitro cell viability assays we developed including a new dataset of ~ 200 mutations. We evaluate the performance of 33 algorithms and found that CHASM, CTAT-cancer, DEOGEN2, and PrimateAI show consistently better performance than the other algorithms. Moreover, cancer-specific algorithms show much better performance than those designed for a general purpose. Our study is a comprehensive assessment of the performance of different algorithms in predicting cancer driver mutations and provides deep insights into the best practice of computationally prioritizing cancer mutation candidates for end-users and for the future development of new algorithms.Keywords
Funding Information
- NCI (CA209851)
This publication has 46 references indexed in Scilit:
- The Cancer Genome Atlas Pan-Cancer analysis projectNature Genetics, 2013
- Predicting the functional consequences of cancer-associated amino acid substitutionsBioinformatics, 2013
- CRAVAT: cancer-related analysis of variants toolkitBioinformatics, 2013
- Predicting the Functional, Molecular, and Phenotypic Consequences of Amino Acid Substitutions using Hidden Markov ModelsHuman Mutation, 2012
- Predicting the functional impact of protein mutations: application to cancer genomicsNucleic Acids Research, 2011
- dbNSFP: A lightweight database of human nonsynonymous SNPs and their functional predictionsHuman Mutation, 2011
- International network of cancer genome projectsNature, 2010
- Automated inference of molecular mechanisms of disease from amino acid substitutionsBioinformatics, 2009
- Identification of deleterious mutations within three human genomesGenome Research, 2009
- Somatic mutations affect key pathways in lung adenocarcinomaNature, 2008