Distinguishing between Natural Products and Synthetic Molecules by Descriptor Shannon Entropy Analysis and Binary QSAR Calculations
- 1 September 2000
- journal article
- research article
- Published by American Chemical Society (ACS) in Journal of Chemical Information and Computer Sciences
- Vol. 40 (5), 1245-1252
- https://doi.org/10.1021/ci0003303
Abstract
Molecular descriptors were identified by Shannon entropy analysis that correctly distinguished, in binary QSAR calculations, between naturally occurring molecules and synthetic compounds. The Shannon entropy concept was first used in digital communication theory and has only very recently been applied to descriptor analysis. Binary QSAR methodology was originally developed to correlate structural features and properties of compounds with a binary formulation of biological activity (i.e., active or inactive) and has here been adapted to correlate molecular features with chemical source (i.e., natural or synthetic). We have identified a number of molecular descriptors with significantly different Shannon entropy and/or “entropic separation” in natural and synthetic compound databases. Different combinations of such descriptors and variably distributed structural keys were applied to learning sets consisting of natural and synthetic molecules and used to derive predictive binary QSAR models. These models were then applied to predict the source of compounds in different test sets consisting of randomly collected natural and synthetic molecules, or, alternatively, sets of natural and synthetic molecules with specific biological activities. On average, greater than 80% prediction accuracy was achieved with our best models. For the test case consisting of molecules with specific activities, greater than 90% accuracy was achieved. From our analysis, some chemical features were identified that systematically differ in many naturally occurring versus synthetic molecules.Keywords
This publication has 9 references indexed in Scilit:
- Comparing 3D Pharmacophore Triplets and 2D Fingerprints for Selecting Diverse Compound SubsetsJournal of Chemical Information and Computer Sciences, 1999
- Statistical Investigation into the Structural Complementarity of Natural Products and Synthetic CompoundsAngewandte Chemie International Edition, 1999
- A Scoring Scheme for Discriminating between Drugs and NondrugsJournal of Medicinal Chemistry, 1998
- Can We Learn To Distinguish between “Drug-like” and “Nondrug-like” Molecules?Journal of Medicinal Chemistry, 1998
- Clustering of Large Databases of Compounds: Using the MDL “Keys” as Structural DescriptorsJournal of Chemical Information and Computer Sciences, 1997
- Selecting Optimally Diverse Compounds from Structure Databases: A Validation Study of Two-Dimensional and Three-Dimensional Molecular DescriptorsJournal of Medicinal Chemistry, 1997
- The Information Content of 2D and 3D Structural Descriptors Relevant to Ligand-Receptor BindingJournal of Chemical Information and Computer Sciences, 1997
- Applications of the radius-diameter diagram to the classification of topological and geometrical shapes of chemical compoundsJournal of Chemical Information and Computer Sciences, 1992
- Iterative partial equalization of orbital electronegativity—a rapid access to atomic chargesTetrahedron, 1980