Application of machine learning to structural molecular biology

Abstract
A technique of machine learning, inductive logic programming implemented in the program GOLEM, has been applied to three problems in structural molecular biology. These problems are: the prediction of protein secondary structure; the identification of rules governing the arrangement of β-sheets strands in the tertiary folding of proteins; and the modelling of a quantitative structure activity relationship (QSAR) of a series of drugs. For secondary structure prediction and the QSAR, GOLEM yielded predictions comparable with contemporary approaches including neural networks. Rules for β-strand arrangement are derived and it is planned to contrast their accuracy with those obtained by human inspection. In all three studies GOLEM discovered rules that provided insight into the stereochemistry of the system. We conclude machine leaning used together with human intervention will provide a powerful tool to discover patterns in biological sequences and structures.