[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.7472/jksii.2012.13.6.1

Comparison of Initial Seeds Methods for K-Means Clustering

Lee, Shinwon (Department of Computer System Engineering, Jungwon University)

Publication Information

Journal of Internet Computing and Services / v.13, no.6, 2012 , pp. 1-8 More about this Journal

Abstract

Clustering method is divided into hierarchical clustering, partitioning clustering, and more. K-Means algorithm is one of partitioning clustering and is adequate to cluster so many documents rapidly and easily. It has disadvantage that the random initial centers cause different result. So, the better choice is to place them as far away as possible from each other. We propose a new method of selecting initial centers in K-Means clustering. This method uses triangle height for initial centers of clusters. After that, the centers are distributed evenly and that result is more accurate than initial cluster centers selected random. It is time-consuming, but can reduce total clustering time by minimizing the number of allocation and recalculation. We can reduce the time spent on total clustering. Compared with the standard algorithm, average consuming time is reduced 38.4%.

Keywords

K-Means; Clustering; Initial Seeds;

Citations & Related Records

Times Cited By KSCI : 1 (Citation Analysis)

Reference
Cited By KSCI

1	Shinwon Lee, Wonhee Lee, "Refining Initial Seeds using Max Average Distance for K-Means Clustering", Korean Society for Internet Information, pp.103-112, 2011. 과학기술학회마을
2	Giordano Adami, Paolo Avesani, and Diego Sona, "Clustering documents in a web directory", Proceedings of the 5th ACM international workshop on Web information and data management, pp.66-73, 2003.
3	Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze, "Introduction to Information Retrieval", Cambridge University Press, pp.331-338, 2008.
4	Jain, A. K. and Dubes, R. C., "Algorithms for Clustering Data". Prentice-Hall advanced reference series. Prentice-Hall, Inc., Upper Saddle River, NJ. 1988.
5	S. P. Lloyd, "Least squares quantization in PCM", Special issue on quantization, IEEE Trans. Inform. Theory, 28, pp.129-137, 1982. DOI
6	McQueen, J. "Some methods for classification and analysis of multivariate observations", In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, pp.281-297, 1967.
7	D.A.Meedeniya, and A.S.Perera, "Evaluation of Partition-Based Text Clustering Techniques to Categorize Indic Language Documents", IEEE International Advance Computing Conference(IACC 2009), pp.1497-1500, 2009.
8	Paul Bunn, and Rafail Ostrovsky, "Secure Two-Party k-Means Clustering", Proceedings of the 14th ACM conference on Computer and communications security, Alexandria, Virginia, USA, pp.486-497, 2007.
9	Rafail Ostrovsky, Yuval Rabani, Leonard J. Schulman and Chaitanya Swamy, "The Effectiveness of Lloyd-Type Methods for then k-Means Problem", Proceedings of the 47th Annual IEEE Symposium on Foundaions of Computer Science, pp.165-176, 2006.
10	Nachiketa Sahoo, Jamie Callan, Ramayya Krishnan , George Duncan, and Rema Padman, "Incremental hierarchical clustering of text documents", Proceedings of the 15th ACM international conference on Information and knowledge management, pp.357-366, 2006.
11	Yu Yonghong, and Bai Wenyang, "Text clustering based on term weights automatic partition", Computer and Automation Engineering (ICCAE), 2010 The 2nd International Conference, pp.373-377, 2010.
12	Shinwon Lee, "A Study on Hierarchical Clustering using Advanced K-Means Algorithm for Information Retrieval", Chonbuk University doctoral thesis, 2005.
13	Madhu Yedla et al., "Enhancing K-means Clustering Algorithm with Improved Initial Center", International Journal of Computer Science and Information Technologies, Vol. 1(2), pp.121-125, 2010.

	Jong-Hwan Ko. (2017) Energy and Buildings Baseline building energy modeling of cluster inverse model by using daily energy consumption in office buildings / 140 , 317
2	(2018) Journal of the Korean Society of Hazard Mitigation Cluster Analysis of Snowfall Observatory Using K-means Algorithm / 18 (2) , 55
3	(2012) 한국항만경제학회지 K-Means 군집모형과 계층적 군집(교차효율성 메트릭스에 의한 평균연결법, Ward법)모형 및 혼합모형을 이용한 컨테이너항만의 클러스터링 측정에 대한 실증적 비교 및 검증에 관한 연구 / 34 (3) , 17

KSCI

Comparison of Initial Seeds Methods for K-Means Clustering K-Means 클러스터링에서 초기 중심 선정 방법 비교

Comparison of Initial Seeds Methods for K-Means Clustering