Abstract
In applying principal components for reducing the dimension of the data before clustering, it has ordinarily been the practice to use components with the largest eigenvalues. We prove, by means of a mixture of two multivariate normal distributions, that this practice is not justified in general. A relationship between the distance of the two sub populations and any subset of principal components is derived, showing that the components with the larger eigenvalues do not necessarily contain more information (distance). This result is further demonstrated through hypothetical as well as real situations which use actual data. The effect of scaling the variables on the distribution of the information to different components is investigated. An application to a mixture of two normal distributions is illustrated by utilizing a set of generated data in which the information is concentrated in the components with the largest and the smallest eigenvalues.