The Effectiveness of Lloyd-Type Methods for the k-Means Problem
- 1 January 2006
- conference paper
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
- No. 02725428,p. 165-176
- https://doi.org/10.1109/focs.2006.75
Abstract
We investigate variants of Lloyd's heuristic for clustering high dimensional data in an attempt to explain its popularity (a half century after its introduction) among practitioners, and in order to suggest improvements in its application. We propose and justify a clusterability criterion for data sets. We present variants of Lloyd's heuristic that quickly lead to provably near-optimal clustering solutions when applied to well-clusterable instances. This is the first performance guarantee for a variant of Lloyd's heuristic. The provision of a guarantee on output quality does not come at the expense of speed: some of our algorithms are candidates for being faster in practice than currently used variants of Lloyd's method. In addition, our other algorithms are faster on well-clusterable instances than recently proposed approximation algorithms, while maintaining similar guarantees on clustering quality. Our main algorithmic contribution is a novel probabilistic seeding process for the starting configuration of a Lloyd-type iterationKeywords
This publication has 30 references indexed in Scilit:
- How Fast Is the k-Means Method?Algorithmica, 2004
- A local search approximation algorithm for k-means clusteringComputational Geometry, 2004
- Polynomial-time approximation schemes for geometric min-sum median clusteringJournal of the ACM, 2002
- On Approximate Geometric k -ClusteringDiscrete & Computational Geometry, 2000
- Data clusteringACM Computing Surveys, 1999
- QuantizationIEEE Transactions on Information Theory, 1998
- Experimental Designs for Selecting Molecules from Large Chemical DatabasesJournal of Chemical Information and Computer Sciences, 1997
- Least squares quantization in PCMIEEE Transactions on Information Theory, 1982
- Quantizing for minimum distortionIEEE Transactions on Information Theory, 1960
- Note on GroupingJournal of the American Statistical Association, 1957