Description and performance analysis of signature file methods for office filing
- 1 July 1987
- journal article
- Published by Association for Computing Machinery (ACM) in ACM Transactions on Information Systems
- Vol. 5 (3), 237-257
- https://doi.org/10.1145/27641.28057
Abstract
Signature files have attracted a lot of interest as an access method for text and specifically for messages in the office environment. Messages are stored sequentially in the message file, whereas their hash-coded abstractions (signatures) are stored sequentially in the signature file. To answer a query, the signature file is examined first, and many nonqualifying messages are immediately rejected. In this paper we examine the problem of designing signature extraction methods and studying their performance. We describe two old methods, generalize another one, and propose a new method and its variation. We provide exact and approximate formulas for the dependency between the false drop probability and the signature size for all the methods, and we show that the proposed method (VBC) achieves approximately ten times smaller false drop probability than the old methods, whereas it is well suited for collections of documents with variable document sizes.Keywords
This publication has 21 references indexed in Scilit:
- Parallel free-text search on the connection machine systemCommunications of the ACM, 1986
- Signature filesACM Transactions on Information Systems, 1984
- Laser optical diskCommunications of the ACM, 1984
- A two level superimposed coding scheme for partial match retrievalInformation Systems, 1983
- Message filesACM Transactions on Information Systems, 1983
- Development of a Spelling ListIEEE Transactions on Communications, 1982
- Partial-match retrieval using indexed descriptor filesCommunications of the ACM, 1980
- Optimal source codes for geometrically distributed integer alphabets (Corresp.)IEEE Transactions on Information Theory, 1975
- Run-length encodings (Corresp.)IEEE Transactions on Information Theory, 1966
- Mathematical analysis of various superimposed coding methodsAmerican Documentation, 1960