Improved classification accuracy in 1- and 2-dimensional NMR metabolomics data using the variance stabilising generalised logarithm transformation
Open Access
- 2 July 2007
- journal article
- research article
- Published by Springer Nature in BMC Bioinformatics
- Vol. 8 (1), 234
- https://doi.org/10.1186/1471-2105-8-234
Abstract
Background: Classifying nuclear magnetic resonance (NMR) spectra is a crucial step in many metabolomics experiments. Since several multivariate classification techniques depend upon the variance of the data, it is important to first minimise any contribution from unwanted technical variance arising from sample preparation and analytical measurements, and thereby maximise any contribution from wanted biological variance between different classes. The generalised logarithm (glog) transform was developed to stabilise the variance in DNA microarray datasets, but has rarely been applied to metabolomics data. In particular, it has not been rigorously evaluated against other scaling techniques used in metabolomics, nor tested on all forms of NMR spectra including 1-dimensional (1D) 1H, projections of 2D 1H, 1H J-resolved (pJRES), and intact 2D J-resolved (JRES). Results: Here, the effects of the glog transform are compared against two commonly used variance stabilising techniques, autoscaling and Pareto scaling, as well as unscaled data. The four methods are evaluated in terms of the effects on the variance of NMR metabolomics data and on the classification accuracy following multivariate analysis, the latter achieved using principal component analysis followed by linear discriminant analysis. For two of three datasets analysed, classification accuracies were highest following glog transformation: 100% accuracy for discriminating 1D NMR spectra of hypoxic and normoxic invertebrate muscle, and 100% accuracy for discriminating 2D JRES spectra of fish livers sampled from two rivers. For the third dataset, pJRES spectra of urine from two breeds of dog, the glog transform and autoscaling achieved equal highest accuracies. Additionally we extended the glog algorithm to effectively suppress noise, which proved critical for the analysis of 2D JRES spectra. Conclusion: We have demonstrated that the glog and extended glog transforms stabilise the technical variance in NMR metabolomics datasets. This significantly improves the discrimination between sample classes and has resulted in higher classification accuracies compared to unscaled, autoscaled or Pareto scaled data. Additionally we have confirmed the broad applicability of the glog approach using three disparate datasets from different biological samples using 1D NMR spectra, 1D projections of 2D JRES spectra, and intact 2D JRES spectra.Keywords
This publication has 24 references indexed in Scilit:
- Metabolomic analysis of methyl jasmonate treated Brassica rapa leaves by 2-dimensional NMR spectroscopyPhytochemistry, 2006
- Scaling and Normalization Effects in NMR Spectroscopic Metabonomic Data SetsAnalytical Chemistry, 2006
- Discrimination Models Using Variance-Stabilizing Transformation of Metabolomic NMR DataOMICS: A Journal of Integrative Biology, 2004
- Metabolomics by numbers: acquiring and understanding global metabolite dataTrends in Biotechnology, 2004
- Improved analysis of multivariate data by variable stability scaling: application to NMR-based metabolic profilingAnalytica Chimica Acta, 2003
- NMR-based metabonomic toxicity classification: hierarchical cluster analysis and k-nearest-neighbour approachesAnalytica Chimica Acta, 2003
- Fundamentals of experimental design for cDNA microarraysNature Genetics, 2002
- Pattern recognition methods and applications in biomedical magnetic resonanceProgress in Nuclear Magnetic Resonance Spectroscopy, 2001
- Improved Baseline Recognition and Modeling of FT NMR SpectraJournal of Magnetic Resonance, 2000
- Multivariate statistical analysis of two-dimensional NMR data to differentiate grapevine cultivars and clonesFood Chemistry, 1996