Nearest neighbour searching in serial files using text signatures
- 1 July 1985
- journal article
- research article
- Published by SAGE Publications in Journal of Information Science
- Vol. 11 (1), 31-39
- https://doi.org/10.1177/016555158501100105
Abstract
A nearest neighbour search procedure is described for use with serial files of textual data. The procedure involves the grouping of records into blocks, each of which is characterised by a fixed-length bit string. A comparable query bit string may then be matched against each of these bit strings, and an upper bound calculation used to identify those blocks which need to be inspected in detail if the document that is most similar to the query is to be identified. Experiments with three small collections of documents and queries are used to test the efficiency of the approach. The experiments show that reduc tions in computation are possible, although the precise savings are crucially dependent upon a range of factors including the frequency characteristics of the documents and queries, the similarity coefficients, and the sizes of the bit strings and of the blocks.Keywords
This publication has 13 references indexed in Scilit:
- Document retrieval using a serial bit string searchInformation Processing & Management, 1983
- A review of the use of inverted files for best match searching in information retrieval systemsJournal of Information Science, 1983
- The practicality of text signatures for accelerating string searchingSoftware: Practice and Experience, 1982
- A comparison of three string matching algorithmsSoftware: Practice and Experience, 1982
- The nearest neighbour problem in information retrievalACM SIGIR Forum, 1981
- Partial-match retrieval using indexed descriptor filesCommunications of the ACM, 1980
- DOCUMENT RETRIEVAL EXPERIMENTS USING INDEXING VOCABULARIES OF VARYING SIZE. I. VARIETY GENERATION SYMBOLS ASSIGNED TO THE FRONTS OF INDEX TERMSJournal of Documentation, 1979
- Searching linear files on‐lineOnline Review, 1977
- On the use of bit maps for multiple key retrievalACM SIGPLAN Notices, 1976
- An information-theoretic approach to text searching in direct access systemsCommunications of the ACM, 1974