BioInfer: a corpus for information extraction in the biomedical domain
Top Cited Papers
Open Access
- 9 February 2007
- journal article
- Published by Springer Nature in BMC Bioinformatics
- Vol. 8 (1), 50
- https://doi.org/10.1186/1471-2105-8-50
Abstract
Lately, there has been a great interest in the application of information extraction methods to the biomedical domain, in particular, to the extraction of relationships of genes, proteins, and RNA from scientific publications. The development and evaluation of such methods requires annotated domain corpora. We present BioInfer (Bio Information Extraction Resource), a new public resource providing an annotated corpus of biomedical English. We describe an annotation scheme capturing named entities and their relationships along with a dependency analysis of sentence syntax. We further present ontologies defining the types of entities and relationships annotated in the corpus. Currently, the corpus contains 1100 sentences from abstracts of biomedical research articles annotated for relationships, named entities, as well as syntactic dependencies. Supporting software is provided with the corpus. The corpus is unique in the domain in combining these annotation types for a single set of sentences, and in the level of detail of the relationship annotation. We introduce a corpus targeted at protein, gene, and RNA relationships which serves as a resource for the development of information extraction systems and their components such as parsers and domain analyzers. The corpus will be maintained and further developed with a current version being available at http://www.it.utu.fi/BioInfer.Keywords
This publication has 19 references indexed in Scilit:
- Lexical adaptation of link grammar to the biomedical sublanguage: a comparative evaluation of three approachesBMC Bioinformatics, 2006
- Evaluation of two dependency parsers on biomedical corpus targeted at protein–protein interactionsInternational Journal of Medical Informatics, 2006
- Agreement, the F-Measure, and Reliability in Information RetrievalJournal of the American Medical Informatics Association, 2005
- PASBio: predicate-argument structures for event extraction in molecular biologyBMC Bioinformatics, 2004
- Extracting biochemical interactions from MEDLINE using a link grammar parserPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2004
- Extracting human protein interactions from MEDLINE using a full-sentence parserBioinformatics, 2004
- The Database of Interacting Proteins: 2004 updateNucleic Acids Research, 2004
- Mining the Biomedical Literature in the Genomic Era: An OverviewJournal of Computational Biology, 2003
- Adding a medical lexicon to an English Parser.2003
- A Coefficient of Agreement for Nominal ScalesEducational and Psychological Measurement, 1960