Detection of Subpopulations in Near-Infrared Reflectance Analysis

Abstract
In typical near-infrared multivariate statistical analyses, samples with similar spectra produce points that cluster in a certain region of spectral hyperspace. These clusters can vary significantly in shape and size due to variation in sample packings, particle-size distributions, component concentrations, and drift with time. These factors, when combined with discriminant analysis using simple distance metrics, produce a test in which a result that places a particular point inside a particular cluster does not necessarily mean that the point is actually a member of the cluster. Instead, the point may be a member of a new, slightly different cluster that overlaps the first. A new cluster can be created by factors like low-level contamination or instrumental drift. An extention added to part of the BEAST (Bootstrap Error-Adjusted Single-sample Technique) can be used to set nonparametric probability-density contours inside spectral clusters as well as outside, and when multiple points begin to appear in a certain region of cluster-hyperspace the perturbation of these density contours can be detected at an assigned significance level. The detection of false samples both within and beyond 3 SDs of the center of the training set is possible with this method. This procedure is shown to be effective for contaminant levels of a few hundred ppm in an over-the-counter drug capsule, and is shown to function with as few as one or two wavelengths, suggesting its application to very simple process sensors.