A Strategy for Identifying Differences in Large Series of Metabolomic Samples Analyzed by GC/MS

Abstract
In metabolomics, the purpose is to identify and quantify all the metabolites in a biological system. Combined gas chromatography and mass spectrometry (GC/MS) is one of the most commonly used techniques in metabolomics together with 1H NMR, and it has been shown that more than 300 compounds can be distinguished with GC/MS after deconvolution of overlapping peaks. To avoid having to deconvolute all analyzed samples prior to multivariate analysis of the data, we have developed a strategy for rapid comparison of nonprocessed MS data files. The method includes baseline correction, alignment, time window determinations, alternating regression, PLS-DA, and identification of retention time windows in the chromatograms that explain the differences between the samples. Use of alternating regression also gives interpretable loadings, which retain the information provided by m/z values that vary between the samples in each retention time window. The method has been applied to plant extracts derived from leaves of different developmental stages and plants subjected to small changes in day length. The data show that the new method can detect differences between the samples and that it gives results comparable to those obtained when deconvolution is applied prior to the multivariate analysis. We suggest that this method can be used for rapid comparison of large sets of GC/MS data, thereby applying time-consuming deconvolution only to parts of the chromatograms that contribute to explain the differences between the samples.