Alternatives to Cross-Validatory Estimation of the Number of Factors in Multivariate Calibration

Abstract
Overcoming the collinearity problem in regression by data compression techniques [i.e., principal component regression (PCR) and partial least-squares (PLS)] requires estimation of the number of factors (principal component) to use for the model. The most common approach is to use cross-validation for this purpose. Unfortunately, cross-validation is time consuming to carry out. Accordingly, we have searched for time-saving methods to estimate the number of factors. Two approaches were considered. The first uses the estimated standard error of the model and the second is an approximation to a cross-validation leave-one-out method. Both alternatives have been tested on spectroscopic data. It has been found that, when the number of wavelengths is limited, both methods give results similar to those obtained by full cross-validation both for PCR and PLS. However, when the number of wavelengths is large, the tested methods are reliable only for PCR and not for PLS.