A THEORETICAL BASIS FOR THE USE OF CO‐OCCURRENCE DATA IN INFORMATION RETRIEVAL

1 February 1977

journal article
Published by Emerald Publishing in Journal of Documentation

Vol. 33 (2), 106-119
https://doi.org/10.1108/eb026637

Abstract

This paper provides a foundation for a practical way of improving the effectiveness of an automatic retrieval system. Its main concern is with the weighting of index terms as a device for increasing retrieval effectiveness. Previously index terms have been assumed to be independent for the good reason that then a very simple weighting scheme can be used. In reality index terms are most unlikely to be independent. This paper explores one way of removing the independence assumption. Instead the extent of the dependence between index terms is measured and used to construct a non-linear weighting function. In a practical situation the values of some of the parameters of such a function must be estimated from small samples of documents. So a number of estimation rules are discussed and one in particular is recommended. Finally the feasibility of the computations required for a non-linear weighting scheme is examined.

Keywords

INFORMATION RETRIEVAL

This publication has 9 references indexed in Scilit:

Relevance weighting of search terms
Journal of the American Society for Information Science, 1976
Precision Weighting—An Effective Automatic Indexing Method
Journal of the ACM, 1976
Minimax estimation with divergence loss function
Information Sciences, 1974
An evaluation of query expansion by the addition of clustered terms for a document retrieval system
Information Storage and Retrieval, 1972
Algorithm 422: minimal spanning tree [H]
Communications of the ACM, 1972
The Analysis of Multivariate Binary Data
Journal of the Royal Statistical Society Series C: Applied Statistics, 1972
Minimum Spanning Trees and Single Linkage Cluster Analysis
Journal of the Royal Statistical Society Series C: Applied Statistics, 1969
On Relevance, Probabilistic Indexing and Information Retrieval
Journal of the ACM, 1960
The Problem of Estimation
The Annals of Mathematical Statistics, 1957

Cited by 280 articles