Abstract
Predictive models for octanol/water partition coefficient (logP), aqueous solubility (logS), blood-brain barrier (logBB), and human intestinal absorption (HIA) were built from a universal, generic molecular descriptor system, designed on the basis of atom type classification. The atom type classification tree was trained to optimize the logP predictions. With nine components, the final partial least-squares (PLS) model predicted logP of 10850 compounds in Starlist with a regression coefficient (r2) of 0.912, cross-validated r2 (q2) of 0.892, and root-mean-square error of estimation (RMSEE) of 0.50 log units. The PLS models for solubility (logS), blood-brain barrier (logBB), and a PLS-DA (discrimination analysis) model for HIA were established from the same atom type descriptors. The seven-component PLS model derived from a diverse set of 1478 organic compounds predicted a 21-compound test set designed by Yalkowsky with r2 = 0.88 and RMSEP (RMS error of prediction) = 0.64. A predictive r2 = 0.90 and RMSEE = 0.26 were achieved for logBB of a 57-compound “Abraham data set” with a three-component model. The first three components of a five-component PLS-DA model were sufficient to clearly separate the 169 drug molecules, collected by Abraham, into three classes, according to their percentage human intestinal absorption.

This publication has 16 references indexed in Scilit: