Predictive Toxicology: Benchmarking Molecular Descriptors and Statistical Methods

Abstract
The development of drugs depends on finding compounds that have beneficial effects with a minimum of toxic effects. The measurement of toxic effects is typically time-consuming and expensive, so there is a need to be able to predict toxic effects from the compound structure. Predicting toxic effects is expected to be challenging because there are usually multiple toxic mechanisms involved. In this paper, combinations of different chemical descriptors and popular statistical methods were applied to the problem of predictive toxicology. Four data sets were collected and cleaned, and four different sets of chemical descriptors were calculated for the compounds in each of the four data sets. Three statistical methods (recursive partitioning, neural networks, and partial least squares) were used to attempt to link chemical descriptors to the response. Good predictions were achieved in the two smaller data sets; we found for large data sets that the results were less effective, indicating that new chemical descriptors or statistical methods are needed. All of the methods and descriptors worked to a degree, but our work hints that certain descriptors work better with specific statistical methods so there is a need for better understanding and for continued methods development.

This publication has 10 references indexed in Scilit: