Weighted cluster ensembles
- 16 January 2009
- journal article
- research article
- Published by Association for Computing Machinery (ACM) in ACM Transactions on Knowledge Discovery From Data
- Vol. 2 (4), 1-40
- https://doi.org/10.1145/1460797.1460800
Abstract
Cluster ensembles offer a solution to challenges inherent to clustering arising from its ill-posed nature. Cluster ensembles can provide robust and stable solutions by leveraging the consensus across multiple clustering results, while averaging out emergent spurious structures that arise due to the various biases to which each participating algorithm is tuned. In this article, we address the problem of combining multiple weighted clusters that belong to different subspaces of the input space. We leverage the diversity of the input clusterings in order to generate a consensus partition that is superior to the participating ones. Since we are dealing with weighted clusters, our consensus functions make use of the weight vectors associated with the clusters. We demonstrate the effectiveness of our techniques by running experiments with several real datasets, including high-dimensional text data. Furthermore, we investigate in depth the issue of diversity and accuracy for our ensemble methods. Our analysis and experimental results show that the proposed techniques are capable of producing a partition that is as good as or better than the best individual clustering.Keywords
Funding Information
- Division of Information and Intelligent Systems (IIS-0447814)
This publication has 14 references indexed in Scilit:
- Soft Cluster EnsemblesPublished by Wiley ,2007
- Locally adaptive metrics for clustering high dimensional dataData Mining and Knowledge Discovery, 2007
- Moderate diversity for better cluster ensemblesInformation Fusion, 2006
- Categorization and Keyword Identification of Unlabeled DocumentsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2006
- Combining multiple clusterings using evidence accumulationIEEE Transactions on Pattern Analysis and Machine Intelligence, 2005
- Subspace clustering for high dimensional dataACM SIGKDD Explorations Newsletter, 2004
- Solving cluster ensemble problems by bipartite graph partitioningPublished by Association for Computing Machinery (ACM) ,2004
- 10.1162/153244303321897735Applied Physics Letters, 2000
- A Fast and High Quality Multilevel Scheme for Partitioning Irregular GraphsSIAM Journal on Scientific Computing, 1998
- On Information and SufficiencyThe Annals of Mathematical Statistics, 1951