Is pushing constraints deeply into the mining algorithms really what we want?
- 1 June 2002
- journal article
- Published by Association for Computing Machinery (ACM) in ACM SIGKDD Explorations Newsletter
- Vol. 4 (1), 50-55
- https://doi.org/10.1145/568574.568582
Abstract
The common approach to exploit mining constraints is to push them deeply into the mining algorithms. In our paper we argue that this approach is based on an understanding of KDD that is no longer up-to-date. In fact, today KDD is seen as a human centered, highly interactive and iterative process. Blindly enforcing constraints already during the mining runs neglects the process character of KDD and therefore is no longer state of the art. Constraints can make a single algorithm run faster but in fact we are still far from response times that would allow true interactivity in KDD. In addition we pay the price of repeated mining runs and moreover risk reducing data mining to some kind of hypothesis testing. Taking all the above into consideration we propose to do exactly the contrary of constrained mining: We accept an initial (nearly) unconstrained and costly mining run. But instead of a sequence of subsequent and still expensive constrained mining runs we answer all further mining queries from this initial result set. Whereas this is straight forward for constraints that can be implemented as filters on the result set, things get more complicated when we restrict the underlying mining data. Actually in practice such constraints are very important, e.g. the generation of rules for certain days of the week, for families, singles, male or female customers etc. We show how to postpone such row-restriction constraints on the transactions from rule generation to rule retrieval from the initial result set.Keywords
This publication has 9 references indexed in Scilit:
- Empirical bayes screening for multi-item associationsPublished by Association for Computing Machinery (ACM) ,2001
- Algorithms for association rule mining — a general survey and comparisonACM SIGKDD Explorations Newsletter, 2000
- Mining frequent patterns without candidate generationPublished by Association for Computing Machinery (ACM) ,2000
- Mining the most interesting rulesPublished by Association for Computing Machinery (ACM) ,1999
- Exploratory mining via constrained frequent set queriesPublished by Association for Computing Machinery (ACM) ,1999
- DMajor—Application Programming Interface for Database MiningData Mining and Knowledge Discovery, 1999
- Dynamic itemset counting and implication rules for market basket dataPublished by Association for Computing Machinery (ACM) ,1997
- The KDD process for extracting useful knowledge from volumes of dataCommunications of the ACM, 1996
- Mining association rules between sets of items in large databasesPublished by Association for Computing Machinery (ACM) ,1993