DOI QR코드

DOI QR Code

Robust K-means for Global Optimization

전역 최적화를 위한 강건한 K-means

  • Si-Hwan Jang (ETRI) ;
  • Joon Lee (Division of Energy Resource and Industrial Engineering, Kangwon National University) ;
  • Jae-Hyeon Eom (Division of Energy Resource and Industrial Engineering, Kangwon National University) ;
  • Sung-Soo Kim (Division of Energy Resource and Industrial Engineering, Kangwon National University)
  • Received : 2024.05.31
  • Accepted : 2024.11.18
  • Published : 2024.12.31

Abstract

K-means is a popular and efficient data clustering method which is one of the most important technique in data mining. K-means is sensitive for initialization and has the possibility to be stuck in local optimum because of hill climbing clustering method. Therefore, we need a robust K-means (RK-means) not only to reduce this possibility but also to increase the probability to search the global optimal clustering solution. The objective of this paper is to propose RK-means with best initial solution from good solutions with good central data for each cluster. The central data of each cluster is selected based on Roulette wheel probabilistic selection using sum of relative distance rate of each data. They have a problem in high density data because they deterministically select the central data for just one initial solution of K-medoid. Our proposed initial solution is the good starting point to find the robust solution by K-means with reducing the possibility being stuck in local optimal solutions. The performance of proposed RK-means data clustering is validated using machine learning repository datasets (Iris, Wine, Glass, Vowel, Cloud) comparing to original K-means by experiment and analysis. Our simulation shows that RK-means using probabilistically relative distance rate are better than K-means with random initialization. The minimum squared distance by RK-means with smaller deviation is lower than that by K-means with higher deviation. RK-means is competitive comparing to data clustering methods based on simulated annealing (SA) and hybrid K-means with SA (KSA & KSAK).

Keywords

Acknowledgement

본 연구는 문화체육관광부 및 한국콘텐츠진흥원의 2024년도 문화체육관광 연구개발사업으로 수행되었음(과제명 : 중소 게임 기업의 게임 제작 검증 효율화를 위한 AI 기반의 대규모 게임 자동검증 기술 개발, 과제번호 : RS-2024-00393500, 기여율: 100%)