Automatic mining of the literature to generate new hypotheses for the possible link between periodontitis and atherosclerosis: lipopolysaccharide as a case study

Abstract
The aim of the current report was to generate and explore new hypotheses into how, in a pathophysiological sense, atherosclerosis and periodontitis could be linked. Two different biomedical informatics techniques were used: an association-based technique that generated a ranked list of genes associated with the diseases, and a natural language processing tool that extracted the relationships between the retrieved genes and lipopolysaccharide (LPS). This combined approach of association-based and natural language processing-based literature mining identified a hit list of 16 candidate genes, with PON1 as the primary candidate. Further study of the literature prompted the hypothesis that PON1 might connect periodontitis with atherosclerosis in both an LPS-dependent and a non-LPS-dependent manner. Furthermore, the resulting genes not only confirmed already known associations between the two diseases, but also provided genes or gene products that have only been investigated separately in the two disease states, and genes or gene products previously reported to be involved in atherosclerosis. These findings remain to be investigated through clinical studies. This example of multidisciplinary research illustrates how collaborative efforts of investigators from different fields of expertise can result in the discovery of new hypotheses.