Use of Imputed Population-based Cancer Registry Data as a Method of Accounting for Missing Information: Application to Estrogen Receptor Status for Breast Cancer

Abstract
The National Cancer Institute's Surveillance, Epidemiology, and End Results (SEER) Program provides a rich source of data stratified according to tumor biomarkers that play an important role in cancer surveillance research. These data are useful for analyzing trends in cancer incidence and survival. These tumor markers, however, are often prone to missing observations. To address the problem of missing data, the authors employed sequential regression multivariate imputation for breast cancer variables, with a particular focus on estrogen receptor status, using data from 13 SEER registries covering the period 1992–2007. In this paper, they present an approach to accounting for missing information through the creation of imputed data sets that can be analyzed using existing software (e.g., SEER*Stat) developed for analyzing cancer registry data. Bias in age-adjusted trends in female breast cancer incidence is shown graphically before and after imputation of estrogen receptor status, stratified by age and race. The imputed data set will be made available in SEER*Stat (http://seer.cancer.gov/analysis/index.html) to facilitate accurate estimation of breast cancer incidence trends. To ensure that the imputed data set is used correctly, the authors provide detailed, step-by-step instructions for conducting analyses. This is the first time that a nationally representative, population-based cancer registry data set has been imputed and made available to researchers for conducting a variety of analyses of breast cancer incidence trends.