Partial-match retrieval via the method of superimposed codes
- 1 January 1979
- journal article
- Published by Institute of Electrical and Electronics Engineers (IEEE) in Proceedings of the IEEE
- Vol. 67 (12), 1624-1642
- https://doi.org/10.1109/proc.1979.11543
Abstract
This paper presents and analyzes an effective and practical method of accomplishing partial-match retrieval on a computer file containing a large number of information records. In partial-match retrieval a subset of the records in the file is selected and retrieved by specifying a query set consisting of a small number of key values; the records selected and retrieved are those having a match to all the key values in the query set. Partial-match retrieval can be a powerful capability when used in information retrieval systems, and a stored file augmented with such a capability is equivalent to an associative or content-addressable store. The method presented in this paper is based upon the use of superimposed codes, a technique used previously with some success in notched-edge card filing systems in which card selection was accomplished mechanically with long metal needles. A new algorithm is presented for generating superimposed codes without the use of a stored code dictionary. The new algorithm executes rapidly on a digital computer and employs a hash function and a pseudo-random number generator; it allows the generation of binary codes having any desired width and weight. It is pointed out that the use of variableweight binary codes gives an advantage on files in which the key values occur with substantially different frequencies. It is shown that organizing the bits of the superimposed code words properly in storage leads to the property that only a small fraction of these bits need be retrieved and processed on each query; this substantially reduces the time required to execute a query. Also, because of the simplicity of the bit operations required in the processing of superimposed code words, a very fast query processor could be designed and built with currently available digital hardware devices and techniques. With the query processor implemented in such digital hardware, partial-match query rates as high as 100/s or even higher appear to be feasible. In order to demonstrate experimentally the power of the method of superimposed codes, the method was implemented in software on a file consisting of the Suffolk County, NY, telephone-directory listings.Keywords
This publication has 25 references indexed in Scilit:
- Detection of combined occurrencesCommunications of the ACM, 1977
- Hashing and trie algorithms for partial match retrievalACM Transactions on Database Systems, 1976
- A practitioner's guide to addressing algorithmsCommunications of the ACM, 1976
- Associative and Parallel ProcessorsACM Computing Surveys, 1975
- Identifier Search Mechanisms: A Survey and Generalized ModelACM Computing Surveys, 1974
- The UNIX time-sharing systemCommunications of the ACM, 1974
- Implementation of the substring test by hashingCommunications of the ACM, 1971
- Scatter storage techniquesCommunications of the ACM, 1968
- Fourier Analysis of Uniform Random Number GeneratorsJournal of the ACM, 1967
- Secondary key retrieval using an IBM 7090-1301 systemCommunications of the ACM, 1965