New public dataset for spotting patterns in medieval document images
- 23 November 2016
- journal article
- Published by SPIE-Intl Soc Optical Eng in Journal of Electronic Imaging
- Vol. 26 (1), 011010
- https://doi.org/10.1117/1.jei.26.1.011010
Abstract
Abstract. With advances in technology, a large part of our cultural heritage is becoming digitally available. In particular, in the field of historical document image analysis, there is now a growing need for indexing and data mining tools, thus allowing us to spot and retrieve the occurrences of an object of interest, called a pattern, in a large database of document images. Patterns may present some variability in terms of color, shape, or context, making the spotting of patterns a challenging task. Pattern spotting is a relatively new field of research, still hampered by the lack of available annotated resources. We present a new publicly available dataset named DocExplore dedicated to spotting patterns in historical document images. The dataset contains 1500 images and 1464 queries, and allows the evaluation of two tasks: image retrieval and pattern localization. A standardized benchmark protocol along with ad hoc metrics is provided for a fair comparison of the submitted approaches. We also provide some first results obtained with our baseline system on this new dataset, which show that there is room for improvement and that should encourage researchers of the document image analysis community to design new systems and submit improved results.Keywords
This publication has 9 references indexed in Scilit:
- Pattern localization in historical document images via template matchingPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2016
- A scalable pattern spotting system for historical documentsPattern Recognition, 2016
- Efficient segmentation-free keyword spotting in historical document collectionsPattern Recognition, 2015
- BING: Binarized Normed Gradients for Objectness Estimation at 300fpsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2014
- Establishing the provenance of historical manuscripts with a novel distance measurePattern Analysis and Applications, 2013
- A symbol spotting approach in graphical documents by hashing serialized graphsPattern Recognition, 2013
- Aggregating Local Image Descriptors into Compact CodesIEEE Transactions on Pattern Analysis and Machine Intelligence, 2011
- Word spotting for historical documentsInternational Journal on Document Analysis and Recognition (IJDAR), 2006
- Word spotting: a new approach to indexing handwritingPublished by Institute of Electrical and Electronics Engineers (IEEE) ,1996