Is pushing constraints deeply into the mining algorithms really what we want?

1 June 2002

journal article
Published by Association for Computing Machinery (ACM) in ACM SIGKDD Explorations Newsletter

Vol. 4 (1), 50-55
https://doi.org/10.1145/568574.568582

Abstract

The common approach to exploit mining constraints is to push them deeply into the mining algorithms. In our paper we argue that this approach is based on an understanding of KDD that is no longer up-to-date. In fact, today KDD is seen as a human centered, highly interactive and iterative process. Blindly enforcing constraints already during the mining runs neglects the process character of KDD and therefore is no longer state of the art. Constraints can make a single algorithm run faster but in fact we are still far from response times that would allow true interactivity in KDD. In addition we pay the price of repeated mining runs and moreover risk reducing data mining to some kind of hypothesis testing. Taking all the above into consideration we propose to do exactly the contrary of constrained mining: We accept an initial (nearly) unconstrained and costly mining run. But instead of a sequence of subsequent and still expensive constrained mining runs we answer all further mining queries from this initial result set. Whereas this is straight forward for constraints that can be implemented as filters on the result set, things get more complicated when we restrict the underlying mining data. Actually in practice such constraints are very important, e.g. the generation of rules for certain days of the week, for families, singles, male or female customers etc. We show how to postpone such row-restriction constraints on the transactions from rule generation to rule retrieval from the initial result set.

Keywords

This publication has 9 references indexed in Scilit:

Empirical bayes screening for multi-item associations
Published by Association for Computing Machinery (ACM) ,2001
Algorithms for association rule mining — a general survey and comparison
ACM SIGKDD Explorations Newsletter, 2000
Mining frequent patterns without candidate generation
Published by Association for Computing Machinery (ACM) ,2000
Mining the most interesting rules
Published by Association for Computing Machinery (ACM) ,1999
Exploratory mining via constrained frequent set queries
Published by Association for Computing Machinery (ACM) ,1999
DMajor—Application Programming Interface for Database Mining
Data Mining and Knowledge Discovery, 1999
Dynamic itemset counting and implication rules for market basket data
Published by Association for Computing Machinery (ACM) ,1997
The KDD process for extracting useful knowledge from volumes of data
Communications of the ACM, 1996
Mining association rules between sets of items in large databases
Published by Association for Computing Machinery (ACM) ,1993

Cited by 29 articles