Automated extraction of information on protein–protein interactions from the biological literature

Abstract
Motivation: To understand biological process, we must clarify how proteins interact with each other. However, since information about protein–protein interactions still exists primarily in the scientific literature, it is not accessible in a computer-readable format. Efficient processing of large amounts of interactions therefore needs an intelligent information extraction method. Our aim is to develop an efficient method for extracting information on protein–protein interaction from scientific literature. Results: We present a method for extracting information on protein–protein interactions from the scientific literature. This method, which employs only a protein name dictionary, surface clues on word patterns and simple part-of-speech rules, achieved high recall and precision rates for yeast (recall \(=\) 86.8% and precision \(=\) 94.3%) and Escherichia coli (recall \(=\) 82.5% and precision \(=\) 93.5%). The result of extraction suggests that our method should be applicable to any species for which a protein name dictionary is constructed. Availability: The program is available on request from the authors. Contact: ono@otsuka.gr.jp