Validation of Biomarker-Based Risk Prediction Models
Top Cited Papers
- 30 September 2008
- journal article
- research article
- Published by American Association for Cancer Research (AACR) in Clinical Cancer Research
- Vol. 14 (19), 5977-5983
- https://doi.org/10.1158/1078-0432.ccr-07-4534
Abstract
The increasing availability and use of predictive models to facilitate informed decision making highlights the need for careful assessment of the validity of these models. In particular, models involving biomarkers require careful validation for two reasons: issues with overfitting when complex models involve a large number of biomarkers, and interlaboratory variation in assays used to measure biomarkers. In this article, we distinguish between internal and external statistical validation. Internal validation, involving training-testing splits of the available data or cross-validation, is a necessary component of the model building process and can provide valid assessments of model performance. External validation consists of assessing model performance on one or more data sets collected by different investigators from different institutions. External validation is a more rigorous procedure necessary for evaluating whether the predictive model will generalize to populations other than the one on which it was developed. We stress the need for an external data set to be truly external, that is, to play no role in model development and ideally be completely unavailable to the researchers building the model. In addition to reviewing different types of validation, we describe different types and features of predictive models and strategies for model building, as well as measures appropriate for assessing their performance in the context of validation. No single measure can characterize the different components of the prediction, and the use of multiple summary measures is recommended.Keywords
This publication has 47 references indexed in Scilit:
- Validation of Analytic Methods for Biomarkers Used in Drug DevelopmentClinical Cancer Research, 2008
- Statistical Challenges in Preprocessing in Microarray Experiments in CancerClinical Cancer Research, 2008
- Gene expression–based survival prediction in lung adenocarcinoma: a multi-site, blinded validation studyNature Medicine, 2008
- Integrating the Predictiveness of a Marker with Its Performance as a ClassifierAmerican Journal of Epidemiology, 2007
- The need for reorientation toward cost‐effective prediction: Comments on ‘Evaluating the added predictive ability of a new marker: From area under the ROC curve to reclassification and beyond’ by M. J. Pencina et al., Statistics in Medicine (DOI: 10.1002/sim.2929)Statistics in Medicine, 2007
- Genomic signatures to guide the use of chemotherapeuticsNature Medicine, 2006
- A Multigene Assay to Predict Recurrence of Tamoxifen-Treated, Node-Negative Breast CancerNew England Journal of Medicine, 2004
- Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implicationsProceedings of the National Academy of Sciences, 2001
- Dangers of Using "Optimal" Cutpoints in the Evaluation of Prognostic FactorsJNCI Journal of the National Cancer Institute, 1994
- Projecting Individualized Probabilities of Developing Breast Cancer for White Females Who Are Being Examined AnnuallyJNCI Journal of the National Cancer Institute, 1989