An efficient k-means clustering algorithm: analysis and implementation

Title	An efficient k-means clustering algorithm: analysis and implementation
Publication Type	Journal Articles
Year of Publication	2002
Authors	Kanungo T, Mount D, Netanyahu NS, Piatko CD, Silverman R, Wu AY
Journal	Pattern Analysis and Machine Intelligence, IEEE Transactions on
Volume	24
Issue	7
Pagination	881 - 892
Date Published	2002/07//
ISBN Number	0162-8828
Keywords	algorithm;color, algorithm;image, algorithm;kd-tree;mean, analysis;filtering, clustering, clustering;, compression;data, distance;covariance, Lloyd, matrices;filtering, quantization;data, segmentation;k-means, squared, structure;data-sensitive, theory;pattern
Abstract	In k-means clustering, we are given a set of n data points in d-dimensional space Rd and an integer k and the problem is to determine a set of k points in Rd, called centers, so as to minimize the mean squared distance from each data point to its nearest center. A popular heuristic for k-means clustering is Lloyd's (1982) algorithm. We present a simple and efficient implementation of Lloyd's k-means clustering algorithm, which we call the filtering algorithm. This algorithm is easy to implement, requiring a kd-tree as the only major data structure. We establish the practical efficiency of the filtering algorithm in two ways. First, we present a data-sensitive analysis of the algorithm's running time, which shows that the algorithm runs faster as the separation between clusters increases. Second, we present a number of empirical studies both on synthetically generated data and on real data sets from applications in color quantization, data compression, and image segmentation
DOI	10.1109/TPAMI.2002.1017616

An efficient k-means clustering algorithm: analysis and implementation

Publications