Evaluation of Three Algorithms to Identify Incident Breast Cancer in Medicare Claims Data

26 February 2007

journal article
research article
Published by Wiley in Health Services Research

Vol. 42 (5), 2056-2069
https://doi.org/10.1111/j.1475-6773.2007.00705.x

Abstract

To test the validity of three published algorithms designed to identify incident breast cancer cases using recent inpatient, outpatient, and physician insurance claims data. The Surveillance, Epidemiology, and End Results (SEER) registry data linked with Medicare physician, hospital, and outpatient claims data for breast cancer cases diagnosed from 1995 to 1998 and a 5 percent control sample of Medicare beneficiaries in SEER areas. We evaluate the sensitivity and specificity of three algorithms applied to new data compared with original reported results. Algorithms use health insurance diagnosis and procedure claims codes to classify breast cancer cases, with SEER as the reference standard. We compare algorithms by age, stage, race, and SEER region, and explore via logistic regression whether adding demographic variables improves algorithm performance. The sensitivity of two of three algorithms is significantly lower when applied to newer data, compared with sensitivity calculated during algorithm development (59 and 77.4 percent versus 90 and 80.2 percent, p<.00001). Sensitivity decreases as age increases, and false negative rates are higher for cases with in situ, metastatic, and unknown stage disease compared with localized or regional breast cancer. Substantial variation also exists by SEER registry. There was potential for improvement in algorithm performance when adding age, region, and race to an indicator variable for whether the algorithm determined a subject to be a breast cancer case (p<.00001). Differential sensitivity of the algorithms by SEER region and age likely reflects variation in practice patterns, because the algorithms rely on administrative procedure codes. Depending on the algorithm, 3-5 percent of subjects overall are misclassified in 1998. Misclassification disproportionately affects older women and those diagnosed with in situ, metastatic, or unknown-stage disease. Algorithms should be applied cautiously to insurance claims databases to assess health care utilization outside SEER-Medicare populations because of uneven misclassification of subgroups that may be understudied already.

Keywords

This publication has 32 references indexed in Scilit:

Improving case ascertainment of a population-based birth defects registry in New York State using hospital discharge data
Birth Defects Research Part A: Clinical and Molecular Teratology, 2005
A Process for Measuring the Quality of Cancer Care: The Quality Oncology Practice Initiative
Journal of Clinical Oncology, 2005
The Added Value of Claims for Cancer Surveillance
Medical Care, 2005
Using Medicare Data to Estimate the Number of Cases Missed by a Cancer Registry
Medical Care, 2004
Undertreatment Strongly Decreases Prognosis of Breast Cancer in Elderly Women
Journal of Clinical Oncology, 2003
Childhood cancer registries in Ontario, Canada: Lessons learned from a comparison of two registries
International Journal of Cancer, 2003
Completeness of case ascertainment in a Scottish regional cancer registry for the year 1992
Public Health, 1997
Factors Associated With Surgical and Radiation Therapy for Early Stage Breast Cancer in Older Women
JNCI Journal of the National Cancer Institute, 1996
Variations in breast cancer treatment by patient and provider characteristics
Breast Cancer Research and Treatment, 1996
The Relation between Health Insurance Coverage and Clinical Outcomes among Women with Breast Cancer
New England Journal of Medicine, 1993

Cited by 56 articles