Profile Scaling Increases the Similarity Search Performance of Molecular Fingerprints Containing Numerical Descriptors and Structural Keys
- 20 June 2003
- journal article
- Published by American Chemical Society (ACS) in Journal of Chemical Information and Computer Sciences
- Vol. 43 (4), 1218-1225
- https://doi.org/10.1021/ci030287u
Abstract
The concept of compound class-specific profiling and scaling of molecular fingerprints for similarity searching is discussed and applied to newly designed fingerprint representations. The approach is based on the analysis of characteristic patterns of bits in keyed fingerprints that are set on in compounds having equivalent biological activity. Once a fingerprint profile is generated for a particular activity class, scaling factors that are weighted according to observed bit frequencies are applied to signature bit positions when searching for similar compounds. In systematic similarity search calculations over 23 diverse activity classes, profile scaling consistently increased the performance of fingerprints containing property descriptors and/or structural keys. A significant improvement of ∼15% was observed for a new fingerprint consisting of binary encoded molecular property descriptors and structural keys. Under scaling conditions, this fingerprint, termed MP-MFP, correctly recognized on average close to 60% of all active test compounds, with only a few false positives. MP-MFP outperformed MACCS keys and other reference fingerprints. In general, optimum performance in scaling calculations was achieved at higher threshold values of the Tanimoto coefficient than in nonscaled calculations, thereby increasing the search selectivity. In general, putting relatively high weight on signature bit positions that were always, or almost always, set on was found to be the most effective scaling procedure. Analysis of class-specific search performance revealed that profile scaling of MP-MFP improved the similarity search results for each of the 23 activity classes.Keywords
This publication has 9 references indexed in Scilit:
- Integration of virtual and high-throughput screeningNature Reviews Drug Discovery, 2002
- Grouping of Coefficients for the Calculation of Inter-Molecular Similarity and Dissimilarity using 2D Fragment Bit-StringsCombinatorial Chemistry & High Throughput Screening, 2002
- Selected Concepts and Investigations in Compound Classification, Molecular Descriptor Analysis, and Virtual ScreeningJournal of Chemical Information and Computer Sciences, 2001
- Statistical Methods in Analytical ChemistryPublished by Wiley ,2000
- Comparing 3D Pharmacophore Triplets and 2D Fingerprints for Selecting Diverse Compound SubsetsJournal of Chemical Information and Computer Sciences, 1999
- Pharmacophore Fingerprinting. 1. Application to QSAR and Focused Library DesignJournal of Chemical Information and Computer Sciences, 1999
- Chemical Similarity SearchingJournal of Chemical Information and Computer Sciences, 1998
- Clustering of Large Databases of Compounds: Using the MDL “Keys” as Structural DescriptorsJournal of Chemical Information and Computer Sciences, 1997
- The Information Content of 2D and 3D Structural Descriptors Relevant to Ligand-Receptor BindingJournal of Chemical Information and Computer Sciences, 1997