Non-parametric, hypothesis-based analysis of microarrays for comparison of several phenotypes

Abstract
Motivation: We present a statistical framework for the analysis of high-dimensional microarray data, where the goal is to compare intensities among several groups based on as few as a single sample from each group. In this setting, it is of interest to compare gene expression among several phenotypes to define candidate genes that simultaneously characterize several criteria, simultaneously, among the comparison groups. We motivate the approach by a comparative microarray experiment in which clones of a cell were singly exposed to several distinct but related conditions. The experiment was conducted to elucidate genes involved in pathways leading to T cell clonal anergy. Results: By integrating inference principles within a bioinformatics setting, we introduce a two-stage approach to select candidate genes that characterize several criteria. The method is unified in its non-parametric approach to inference and description. For inference, we construct a testable hypothesis based on the criteria of interest in a high-dimensional space, while preserving the dependence among genes. Upon rejecting the null, we estimate the cardinality of a set of individual candidate genes (or gene pairs) that depict the events of interest. With this estimate, we then select individual genes (or gene pairs) based upon a two-dimensional ranking that examines relations within and between genes, among comparison groups, using singular value decomposition in combination with inner product concepts. Availability: The functions developed for obtaining results from our approach are available upon request. A detailed documentation of the methods and the experiment may be obtained from http://www.cancerbiostats.onc.jhmi.edu/Kowalski.htm