The discarding of variables in multivariate analysis

Abstract
In many multivariate situations we are presented with more variables than we would like, and the question arises whether they are all necessary and if not which can be discarded. In this paper we consider two such situations. (a)Regression analysis. The problem here is whether any variables can be discarded as adding little or nothing to the accuracy with which the regression equation correlates with the dependent variable. (b)Interdependence analysis. The problem is whether a constellation in p dimonsions collapses, exactly or approximately, into fewer dimensions, and if so whether any of the original variables can be discarded. We may define the best solution to (a) using any given number of variables as the one that maximizes the multiple correlation between the selected variables and the dependent variable, and similarly for (b) as the one that maximizes the smallest multiple correlation with any of the rejected variables. In practice it is usual to accept an approximate solution to (a) based on ‘step-wise’ multiple regression: we know of no standard program for (b). We have developed cut-off rules that enable us to find the best solution to both problems by partial enumeration. The paper discusses the details of this approach, and computational experience.