Abstract
Cross-validation is a statistical procedure that produces an estimate of forecast skill which is less biased than the usual hindcast skill estimates. The cross-validation method systematically deletes one or more cases in a dataset, derives a forecast model from the remaining cases, and tests it on the deleted case or cases. The procedure is nonparametric and can be applied to any automated model building technique. It can also provide important diagnostic information about influential cases in the dataset and the stability of the model. Two experiments were conducted using cross-validation to estimate forecast skill in different predictive models of North Pacific sea surface temperatures (SSTs). The results indicate that bias, or artificial predictability (defined here as the difference between the usual hindcast skill and the forecast skill estimated by cross-validation), increases with each decision—either screening of potential predictors or fixing the value of a coefficient—drawn from the da... Abstract Cross-validation is a statistical procedure that produces an estimate of forecast skill which is less biased than the usual hindcast skill estimates. The cross-validation method systematically deletes one or more cases in a dataset, derives a forecast model from the remaining cases, and tests it on the deleted case or cases. The procedure is nonparametric and can be applied to any automated model building technique. It can also provide important diagnostic information about influential cases in the dataset and the stability of the model. Two experiments were conducted using cross-validation to estimate forecast skill in different predictive models of North Pacific sea surface temperatures (SSTs). The results indicate that bias, or artificial predictability (defined here as the difference between the usual hindcast skill and the forecast skill estimated by cross-validation), increases with each decision—either screening of potential predictors or fixing the value of a coefficient—drawn from the da...