The Impact of Initialization Strategies on the K-Means Convergence

Written by NEPTTP on September 8, 2020. Posted in Capstone Research Projects

Researchers: Tshauambea Murendeni, University of Venda
Supervisor: Ms Nothabo Ndebele, University of the Witwatersrand, Johannesburg

Clustering is a method where information items are grouped to attain the objective of maximizing within cluster resemblance and dissimilarity of different clusters [1]. The kmeans algorithm is commonly used, simple and ease to implement, unsupervised partitioning clustering algorithm. The kmeans convergence to the optimal solution is dependent on the initialization strategy. This study utilizes 3 initialization strategies namely: the random, k-means++ and farthest transversal to experiment on the k-means algorithm. The experiments were conducted on various consumer segmentation data sets of different sizes and data structures. The comparison made on these initialization strategies were the quantity of steps the k-means algorithm took to reach its optimal solution. The experiments show that all the initialization strategies lead to the same optimal solution of the kmeans algorithm. However the k-means++ reachs the optimal solution with less iterations compared to other initialization strategies used in this study. For the data sets utilized in this research the k-means++ initialized k-means is more efficient or faster than the k-medoids algorithms to reach their optimal solution.