Chemical Descriptors with Distinct Levels of Information Content and Varying Sensitivity to Differences between Selected Compound Databases Identified by SE-DSE Analysis
- 28 November 2001
- journal article
- research article
- Published by American Chemical Society (ACS) in Journal of Chemical Information and Computer Sciences
- Vol. 42 (1), 87-93
- https://doi.org/10.1021/ci0103065
Abstract
Analysis of the variability of molecular descriptors in large compound databases has recently been carried out using both the Shannon entropy (SE) and differential Shannon entropy (DSE) concepts that reduce descriptor distributions to their information content (SE analysis) and detect intrinsic differences between descriptor settings in compound databases (DSE analysis). Here it is shown that a combination of SE and DSE calculations, termed SE-DSE analysis, makes it possible to identify molecular descriptors most sensitive to systematic differences in databases consisting of synthetic, drug-like, and natural molecules. Descriptors with consistently high information content are detected, and database-specific differences are quantified. Different sets of only very few descriptors were found to be most responsive to principal differences between synthetic, natural, and drug-like molecules. Descriptors with DSE values furthest away from zero are likely to best distinguish between compounds with different characteristics. SE-DSE analysis also reveals that a number of descriptors are not sensitive to compound class-specific features, despite their complexity and consistently high information content.Keywords
This publication has 14 references indexed in Scilit:
- Differential Shannon Entropy as a Sensitive Measure of Differences in Database Variability of Molecular DescriptorsJournal of Chemical Information and Computer Sciences, 2001
- Selected Concepts and Investigations in Compound Classification, Molecular Descriptor Analysis, and Virtual ScreeningJournal of Chemical Information and Computer Sciences, 2001
- A widely applicable set of descriptorsJournal of Molecular Graphics and Modelling, 2000
- Distinguishing between Natural Products and Synthetic Molecules by Descriptor Shannon Entropy Analysis and Binary QSAR CalculationsJournal of Chemical Information and Computer Sciences, 2000
- The Characterization of Chemical Structures Using Molecular Properties. A SurveyJournal of Chemical Information and Computer Sciences, 1999
- Prediction of Physicochemical Parameters by Atomic ContributionsJournal of Chemical Information and Computer Sciences, 1999
- Clustering of Large Databases of Compounds: Using the MDL “Keys” as Structural DescriptorsJournal of Chemical Information and Computer Sciences, 1997
- Highly discriminating distance-based topological indexChemical Physics Letters, 1982
- Iterative partial equalization of orbital electronegativity—a rapid access to atomic chargesTetrahedron, 1980
- Chemical graphsTheoretical Chemistry Accounts, 1979