Analysis of a Large Structure/Biological Activity Data Set Using Recursive Partitioning

30 October 1999

journal article
conference paper
Published by American Chemical Society (ACS) in Journal of Chemical Information and Computer Sciences

Vol. 39 (6), 1017-1026
https://doi.org/10.1021/ci9903049

Abstract

Combinatorial chemistry and high-throughput screening are revolutionizing the process of lead discovery in the pharmaceutical industry. Large numbers of structures and vast quantities of biological assay data are quickly being accumulated, overwhelming traditional structure/activity relationship (SAR) analysis technologies. Recursive partitioning is a method for statistically determining rules that classify objects into similar categories or, in this case, structures into groups of molecules with similar potencies. SCAM is a computer program implemented to make extremely efficient use of this methodology. Depending on the size of the data set, rules explaining biological data can be determined interactively. An example data set of 1650 monoamine oxidase inhibitors exemplifies the method, yielding substructural rules and leading to general classifications of these inhibitors. The method scales linearly with the number of descriptors, so hundreds of thousands of structures can be analyzed utilizing thousands to millions of molecular descriptors. There are currently no methods to deal with statistical analysis problems of this size. An important aspect of this analysis is the ability to deal with mixtures, i.e., identify SAR rules for classes of compounds in the same data set that might be binding in different ways. Most current quantitative structure/activity relationship methods require that the compounds follow a single mechanism. Advantages and limitations of this methodology are presented.

Keywords

This publication has 39 references indexed in Scilit:

SmcHD1, containing a structural-maintenance-of-chromosomes hinge domain, has a critical role in X inactivation
Nature Genetics, 2008
Combinatorial chemistry: A rational approach to chemical diversity
European Journal of Medicinal Chemistry, 1996
An Overview of Recent Expert System Applications in Analytical Chemistry
Critical Reviews in Analytical Chemistry, 1996
Optimization of the Biological Activity of Combinatorial Compound Libraries by a Genetic Algorithm
Angewandte Chemie International Edition in English, 1995
Experimental system for similarity and 3D searching of CAS registry substances. 1. 3D substructure searching
Journal of Chemical Information and Computer Sciences, 1993
The Maximal Smoothing Principle in Density Estimation
Journal of the American Statistical Association, 1990
Artificial intelligence used for the interpretation of combined spectral data. 3. Automated generation of interpretation rules for infrared spectral data
Journal of Chemical Information and Computer Sciences, 1987
Topological torsion: a new molecular descriptor for SAR applications. Comparison with other descriptors
Journal of Chemical Information and Computer Sciences, 1987
Atom pairs as molecular features in structure-activity studies: definition and applications
Journal of Chemical Information and Computer Sciences, 1985
Problems in the Analysis of Survey Data, and a Proposal
Journal of the American Statistical Association, 1963

Cited by 161 articles