Predicting the functional impact of protein mutations: application to cancer genomics

Top Cited Papers
Open Access
Abstract
As large-scale re-sequencing of genomes reveals many protein mutations, especially in human cancer tissues, prediction of their likely functional impact becomes important practical goal. Here, we introduce a new functional impact score (FIS) for amino acid residue changes using evolutionary conservation patterns. The information in these patterns is derived from aligned families and sub-families of sequence homologs within and between species using combinatorial entropy formalism. The score performs well on a large set of human protein mutations in separating disease-associated variants (∼19 200), assumed to be strongly functional, from common polymorphisms (∼35 600), assumed to be weakly functional (area under the receiver operating characteristic curve of ∼0.86). In cancer, using recurrence, multiplicity and annotation for ∼10 000 mutations in the COSMIC database, the method does well in assigning higher scores to more likely functional mutations (‘drivers’). To guide experimental prioritization, we report a list of about 1000 top human cancer genes frequently mutated in one or more cancer types ranked by likely functional impact; and, an additional 1000 candidate cancer genes with rare but likely functional mutations. In addition, we estimate that at least 5% of cancer-relevant mutations involve switch of function, rather than simply loss or gain of function.