Towards integrated clinico-genomic models for personalized medicine: combining gene expression signatures and clinical factors in breast cancer outcomes prediction

Abstract
Genomic data, particularly genome-scale measures of gene expression derived from DNA microarray studies, has the potential for adding enormous information to the analysis of biological phenotypes. Perhaps the most successful application of this data has been in the characterization of human cancers, including the ability to predict clinical outcomes. Nevertheless, most analyses have used gene expression profiles to define broad group distinctions, similar to the use of traditional clinical risk factors. As a result, there remains considerable heterogeneity within the broadly defined groups and thus predictions fall short of providing accurate predictions for individual patients. One strategy to resolve this heterogeneity is to make use of multiple gene expression patterns that are more powerful in defining individual characteristics and predicting outcomes than any single gene expression pattern. Statistical tree-based classification systems provide a framework for assessing multiple patterns, that we term metagenes, selecting those that are most capable of resolving the biological heterogeneity. Moreover, this framework provides a mechanism to combine multiple forms of data, both genomic and clinical, to most effectively characterize individual patients and achieve the goal of personalized predictions of clinical outcomes.