Development of a Method for Evaluating Drug-Likeness and Ease of Synthesis Using a Data Set in Which Compounds Are Assigned Scores Based on Chemists' Intuition

Abstract
The concept of drug-likeness, an important characteristic for any compound in a screening library, is nevertheless difficult to pin down. Based on our belief that this concept is implicit within the collective experience of working chemists, we devised a data set to capture an intuitive human understanding of both this characteristic and ease of synthesis, a second key characteristic. Five chemists assigned a pair of scores to each of 3980 diverse compounds, with the component scores of each pair corresponding to drug-likeness and ease of synthesis, respectively. Using this data set, we devised binary classifiers with an artificial neural network and a support vector machine. These models were found to efficiently eliminate compounds that are not drug-like and/or hard-to-synthesize derivatives, demonstrating the suitability of these models for use as compound acquisition filters.