A Three-Gene Model to Robustly Identify Breast Cancer Molecular Subtypes
Top Cited Papers
Open Access
- 18 January 2012
- journal article
- research article
- Published by Oxford University Press (OUP) in JNCI Journal of the National Cancer Institute
- Vol. 104 (4), 311-325
- https://doi.org/10.1093/jnci/djr545
Abstract
Single sample predictors (SSPs) and Subtype classification models (SCMs) are gene expression–based classifiers used to identify the four primary molecular subtypes of breast cancer (basal-like, HER2-enriched, luminal A, and luminal B). SSPs use hierarchical clustering, followed by nearest centroid classification, based on large sets of tumor-intrinsic genes. SCMs use a mixture of Gaussian distributions based on sets of genes with expression specifically correlated with three key breast cancer genes (estrogen receptor [ER], HER2, and aurora kinase A [AURKA]). The aim of this study was to compare the robustness, classification concordance, and prognostic value of these classifiers with those of a simplified three-gene SCM in a large compendium of microarray datasets. Thirty-six publicly available breast cancer datasets (n = 5715) were subjected to molecular subtyping using five published classifiers (three SSPs and two SCMs) and SCMGENE, the new three-gene (ER, HER2, and AURKA) SCM. We used the prediction strength statistic to estimate robustness of the classification models, defined as the capacity of a classifier to assign the same tumors to the same subtypes independently of the dataset used to fit it. We used Cohen κ and Cramer V coefficients to assess concordance between the subtype classifiers and association with clinical variables, respectively. We used Kaplan–Meier survival curves and cross-validated partial likelihood to compare prognostic value of the resulting classifications. All statistical tests were two-sided. SCMs were statistically significantly more robust than SSPs, with SCMGENE being the most robust because of its simplicity. SCMGENE was statistically significantly concordant with published SCMs (κ = 0.65–0.70) and SSPs (κ = 0.34–0.59), statistically significantly associated with ER (V = 0.64), HER2 (V = 0.52) status, and histological grade (V = 0.55), and yielded similar strong prognostic value. Our results suggest that adequate classification of the major and clinically relevant molecular subtypes of breast cancer can be robustly achieved with quantitative measurements of three key genes.This publication has 73 references indexed in Scilit:
- DNA methylation profiling reveals a predominant immune component in breast cancersEMBO Molecular Medicine, 2011
- Genes that mediate breast cancer metastasis to the brainNature, 2009
- Gene-Expression Signatures in Breast CancerNew England Journal of Medicine, 2009
- Genomic and transcriptional aberrations linked to breast cancer pathophysiologiesCancer Cell, 2006
- The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurementsNature Biotechnology, 2006
- Concordance among Gene-Expression–Based Predictors for Breast CancerNew England Journal of Medicine, 2006
- Oncogenic pathway signatures in human cancers as a guide to targeted therapiesNature, 2005
- A Multigene Assay to Predict Recurrence of Tamoxifen-Treated, Node-Negative Breast CancerNew England Journal of Medicine, 2004
- A Gene-Expression Signature as a Predictor of Survival in Breast CancerNew England Journal of Medicine, 2002
- Gene expression profiling predicts clinical outcome of breast cancerNature, 2002