Nearest neighbour searching in serial files using text signatures

1 July 1985

journal article
research article
Published by SAGE Publications in Journal of Information Science

Vol. 11 (1), 31-39
https://doi.org/10.1177/016555158501100105

Abstract

A nearest neighbour search procedure is described for use with serial files of textual data. The procedure involves the grouping of records into blocks, each of which is characterised by a fixed-length bit string. A comparable query bit string may then be matched against each of these bit strings, and an upper bound calculation used to identify those blocks which need to be inspected in detail if the document that is most similar to the query is to be identified. Experiments with three small collections of documents and queries are used to test the efficiency of the approach. The experiments show that reduc tions in computation are possible, although the precise savings are crucially dependent upon a range of factors including the frequency characteristics of the documents and queries, the similarity coefficients, and the sizes of the bit strings and of the blocks.

Keywords

This publication has 13 references indexed in Scilit:

Document retrieval using a serial bit string search
Information Processing & Management, 1983
A review of the use of inverted files for best match searching in information retrieval systems
Journal of Information Science, 1983
The practicality of text signatures for accelerating string searching
Software: Practice and Experience, 1982
A comparison of three string matching algorithms
Software: Practice and Experience, 1982
The nearest neighbour problem in information retrieval
ACM SIGIR Forum, 1981
Partial-match retrieval using indexed descriptor files
Communications of the ACM, 1980
DOCUMENT RETRIEVAL EXPERIMENTS USING INDEXING VOCABULARIES OF VARYING SIZE. I. VARIETY GENERATION SYMBOLS ASSIGNED TO THE FRONTS OF INDEX TERMS
Journal of Documentation, 1979
Searching linear files on‐line
Online Review, 1977
On the use of bit maps for multiple key retrieval
ACM SIGPLAN Notices, 1976
An information-theoretic approach to text searching in direct access systems
Communications of the ACM, 1974

Cited by 15 articles