Redefinition of Affymetrix probe sets by sequence overlap with cDNA microarray probes reduces cross-platform inconsistencies in cancer-associated gene expression measurements

Abstract
Comparison of data produced on different microarray platforms often shows surprising discordance. It is not clear whether this discrepancy is caused by noisy data or by improper probe matching between platforms. We investigated whether the significant level of inconsistency between results produced by alternative gene expression microarray platforms could be reduced by stringent sequence matching of microarray probes. We mapped the short oligo probes of the Affymetrix platform onto cDNA clones of the Stanford microarray platform. Affymetrix probes were reassigned to redefined probe sets if they mapped to the same cDNA clone sequence, regardless of the original manufacturer-defined grouping. The NCI-60 gene expression profiles produced by Affymetrix HuFL platform were recalculated using these redefined probe sets and compared to previously published cDNA measurements of the same panel of RNA samples. The redefined probe sets displayed a substantially higher level of cross-platform consistency at the level of gene correlation, cell line correlation and unsupervised hierarchical clustering. The same strategy allowed an almost complete correspondence of breast cancer subtype classification between Affymetrix gene chip and cDNA microarray derived gene expression data, and gave an increased level of similarity between normal lung derived gene expression profiles using the two technologies. In total, two Affymetrix gene-chip platforms were remapped to three cDNA platforms in the various cross-platform analyses, resulting in improved concordance in each case. We have shown that probes which target overlapping transcript sequence regions on cDNA microarrays and Affymetrix gene-chips exhibit a greater level of concordance than the corresponding Unigene or sequence matched features. This method will be useful for the integrated analysis of gene expression data generated by multiple disparate measurement platforms.