Binary Formal Inference-Based Recursive Modeling Using Multiple Atom and Physicochemical Property Class Pair and Torsion Descriptors as Decision Criteria

Abstract
Analysis of a large amount of information, typically generated by high-throughput screening, is a very difficult task. To address this problem, we have developed binary formal inference-based recursive modeling using atom and physicochemical property class pair and torsion descriptors. Recursive partitioning is an exploratory technique for identifying structure in data. The implemented algorithm utilizes a statistical hypothesis testing, similar to Hawkins' formal inference-based recursive modeling program, to separate a data set into two homogeneous subsets at each splitting node. This process is repeated recursively until no further separation can occur. Our implementation of recursive partitioning differs from previously reported approaches by employing a method to extract multiple features at each splitting node. The method was examined for its ability to distinguish random and real data sets. The effect of including a single descriptor and multiple descriptors in the splitting descriptor set was also studied. The method was tested using 27 401 National Cancer Institute (NCI) compounds and their pGI50 (−log(GI50)) against the NCI-H23 cell line. The analyses show that partitioning using multiple descriptors is advantageous in analyzing the structure−activity relationship information.