Modelling the structure and function of enzymes by machine learning

Abstract
A machine learning program, GOLEM, has been applied to two problems: (1) the prediction of protein secondary structure from sequence and (2) modelling a quantitative structure-activity relationship in drug design. GOLEM takes as input observations and combines them with background knowledge of chemistry to yield rules expressed as stereochemical principles for prediction. The secondary structure prediction was explored on the α/α class of proteins; on an unrelated test set it yielded 81 % accuracy. The rules from GOLEM defined patterns of residues forming α-helices. The system studied for drug design was the activities of trimethoprim analogues binding to E. coli dihydrofolate reductase. The GOLEM rules were a better model than standard regression approaches. More importantly, these rules described the chemical properties of the enzyme-binding site that were in broad agreement with the crystallographic structure.