Abstract
A procedure for identifying subgroups that are homogeneous with respect to an outcome variable is described. The method, search partition analysis (SPAN), is formulated in terms of a numeric outcome variable y and a set of predictors, explanatory variables or risk factors x = x1, x2, …, xp. The objective is to split observations into two groups by a binary partition, specified using Boolean expressions of the predictors, x, such that y is as homogeneous as possible in the resultant groups; uniformly ‘low’ in one and uniformly ‘high’ in the other. Subgroups within each of the two groups can be identified from the Boolean expressions. SPAN implements a search for the ‘best’ partition from among a class of regular Boolean expressions. Features of the method are described, including how to measure partition homogeneity, complexity penalizing, search strategies and subgroup definition and representation. The approach is illustrated with analyses of predictors of low birth weight and predictors of impaired glucose tolerance for screening purposes.

This publication has 16 references indexed in Scilit: