Potential Drugs and Nondrugs: Prediction and Identification of Important Structural Features
- 25 February 2000
- journal article
- Published by American Chemical Society (ACS) in Journal of Chemical Information and Computer Sciences
- Vol. 40 (2), 280-292
- https://doi.org/10.1021/ci990266t
Abstract
Using decision trees, a model to discriminate between potential drugs and nondrugs has been developed. Compounds from the Available Chemical Directory and the World Drug Index databases were used as training set; the molecular structures were represented using extended atom types. The error rate on an independent validation data set is 17.4%. The number of false negatives can be reduced by penalizing the misclassification of drugs so that 92 out of 100 potential drugs are correctly recognized. At the same time, 34 out of 100 nondrugs are classified as potential drugs. The predictions of the model can be used to guide the purchase or selection of compounds for biological screening or the design of combinatorial libraries. The visualization of the generated models in the form of colored trees allowed us to identify a few, surprisingly simple features that explain the most significant differences between drugs and nondrugs in the training set: Just by testing the presence of hydroxyl, tertiary or secondary amino, carboxyl, phenol, or enol groups, already three quarters of all drugs could be correctly recognized. The nondrugs, on the other hand, are characterized by their aromatic nature with a low content of functional groups besides halogens. The general applicability of the model is shown by the predictions made for several Organon databases.Keywords
This publication has 10 references indexed in Scilit:
- A Scoring Scheme for Discriminating between Drugs and NondrugsJournal of Medicinal Chemistry, 1998
- Identification of Biological Activity Profiles Using Substructural Analysis and Genetic AlgorithmsJournal of Chemical Information and Computer Sciences, 1998
- Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settingsAdvanced Drug Delivery Reviews, 1997
- Chemical Fragment Generation and Clustering SoftwareJournal of Chemical Information and Computer Sciences, 1997
- Analysis of a Large Structure‐Activity Data Set Using Recursive PartitioningQuantitative Structure-Activity Relationships, 1997
- Use of Structure−Activity Data To Compare Structure-Based Clustering Methods and Descriptors for Use in Compound SelectionJournal of Chemical Information and Computer Sciences, 1996
- Synthesis and Applications of Small Molecule LibrariesChemical Reviews, 1996
- Enhancing the diversity of a corporate database using chemical database clustering and analysisJournal of Computer-Aided Molecular Design, 1995
- Applications of Combinatorial Technologies to Drug Discovery. 1. Background and Peptide Combinatorial LibrariesJournal of Medicinal Chemistry, 1994
- Atomic physicochemical parameters for three dimensional structure directed quantitative structure-activity relationships. 4. Additional parameters for hydrophobic and dispersive interactions and their application for an automated superposition of certain naturally occurring nucleoside antibioticsJournal of Chemical Information and Computer Sciences, 1989