Context Aware Data-Driven Retrosynthetic Analysis

Abstract
Modern drug discovery is an iterative process relying on hypothesis generation through exploitation of available data and hypothesis testing which produces informative results necessary for subsequent rounds of exploration. In this setting, hypothesis generation aims at designing chemical structures likely to meet the pharmaceutically relevant objectives of the discovery project pursued while hypothesis testing involves the experimental preparation and testing of those chemical structures to prove or reject the hypothesis. While much attention has been placed on effective compound design it is often the case that hypothesis generation efforts lead to novel chemical structure designs for which no established chemical synthesis route exists. We introduce a chemical context aware data-driven method built upon millions of available reactions, with attractive run-time characteristics, to recommend synthetic routes matching a precedent-derived template. Coupled with modern automated synthesis platforms and available building block collections the method enables drug discovery researchers to identify easy to interpret and experimentally implement routes for target compounds through ChemoPrint, an inhouse computer-aided synthesis platform. Herein, we present results from the application of the method demonstrating how such tools can bridge chemical synthesis knowledge with synthetic resources and facilitate hypothesis testing thereby reducing the time required to complete an Idea-to-Data drug discovery cycle.