DOI QR코드

DOI QR Code

Group Search Optimization Data Clustering Using Silhouette

실루엣을 적용한 그룹탐색 최적화 데이터클러스터링

  • Kim, Sung-Soo (Department of System & Management Engineering, Kangwon National University) ;
  • Baek, Jun-Young (Department of System & Management Engineering, Kangwon National University) ;
  • Kang, Bum-Soo (Department of System & Management Engineering, Kangwon National University)
  • 김성수 (강원대학교 시스템경영공학과) ;
  • 백준영 (강원대학교 시스템경영공학과) ;
  • 강범수 (강원대학교 시스템경영공학과)
  • Received : 2017.05.10
  • Accepted : 2017.08.11
  • Published : 2017.08.31

Abstract

K-means is a popular and efficient data clustering method that only uses intra-cluster distance to establish a valid index with a previously fixed number of clusters. K-means is useless without a suitable number of clusters for unsupervised data. This paper aimsto propose the Group Search Optimization (GSO) using Silhouette to find the optimal data clustering solution with a number of clusters for unsupervised data. Silhouette can be used as valid index to decide the number of clusters and optimal solution by simultaneously considering intra- and inter-cluster distances. The performance of GSO using Silhouette is validated through several experiment and analysis of data sets.

Keywords

References

  1. 김성수, 김형준, 김기동, "개미군 최적화 방법을 이용한 Location Area Planning", 경영과학, 제25권, 제2호(2008), pp.73-80.
  2. 김성수, 최승현, "개미군 최적화 방법을 적용한 무선 센서 네트워크에서의 클러스터링 최적 설계", 경영과학, 제26권, 제3호(2009), pp.55-65.
  3. 이수현, 정영선, 김재윤, "경영사례를 이용한 군집화 유효성 지수의 성능비교", 한국경영과학회지, 제41권, 제2호(2016), pp.17-33. https://doi.org/10.7737/JKORMS.2016.41.2.017
  4. Arbelaitz et al., "An extensive comparative study of cluster validity indices," Pattern Recognition, Vol.46, No.1(2013), pp.243-256. https://doi.org/10.1016/j.patcog.2012.07.021
  5. DW van der Merwe, Data clustering using Particle swarm optimization, Proceedings of the IEEE Congress on Evolutionary Computation, CEC 2003, 8-12 December 2003, Canberra, Australia
  6. He, S., Q.H. Wu, and J.R. Saunders, "Group Search Optimizer : An Optimization Algorithm Inspired by Animal Searching Behavior," IEEE Transactions on Evolutionary Computation, Vol.13, No.5(2009), pp.973-990. https://doi.org/10.1109/TEVC.2009.2011992
  7. Hruschka and et al., "A survey of evolutionary algorithms for clustering," IEEE Transactions on systems, man, and cyberneticsPart C : Applications and reviews, Vol.39, No.2(2009), pp.133-155. https://doi.org/10.1109/TSMCC.2008.2007252
  8. Karaboga, D. and C. Ozturk, "A novel clustering approach : Artificial bee colony algorithm," Applied Soft Computing, Vol.11, No.1 (2011), pp.652-657. https://doi.org/10.1016/j.asoc.2009.12.025
  9. Kao, Y.T., E. Zahara, and I.W. Kao, "A hybridized approach to data clustering," Expert Systems with Applications, Vol.34, No.3(2008), pp.1754-1762. https://doi.org/10.1016/j.eswa.2007.01.028
  10. Krishna, K. and M.N. Murty, "Genetic Kmeans algorithm," IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), Vol.29, No.3(1999), pp.433- 439. https://doi.org/10.1109/3477.764879
  11. Lleti et al., "Selecting variables for k-means cluster analysis by using a genetic algorithm that optimizes the silhouettes," Analytica Chimica Acta., Vol.515, No.1(2004), pp.87-100. https://doi.org/10.1016/j.aca.2003.12.020
  12. Ng, R.T. and J. Han, "Efficient and Effective Clustering Methods for Spatial Data Mining," Proceeding VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases, Vol.20(1994), pp.144-155.
  13. Pacifico, L.D.S. and T.B. Ludermir, "A Group Search Optimization Method for Data Clustering," 2014 Brazilian Conference on Intelligent Systems, IEEE, (2014), pp.342-347.
  14. Pacifico, L.D.S. and T.B. Ludermir, Data Clustering using Group Search Optimization with Alternative Fitness Functions, 2016 Brazilian Conference on Intelligent Systems, IEEE, (2016), pp.301-306.
  15. Park, H.S. and C.H. Jun, "A simple and fast algorithm for K-medoids clustering," Expert Systems with Applications, Vol.36, No.2(2009), pp.3336-3341. https://doi.org/10.1016/j.eswa.2008.01.039
  16. Rousseeuw, P.J., "Silhouettes : A graphical aid to the interpretation and validation of cluster analysis," Journal of Computational and Applied Mathematics, Vol.20(1987), pp 53-65. https://doi.org/10.1016/0377-0427(87)90125-7
  17. Ruspini, E.H., "Numerical methods for fuzzy clustering," Information Sciences, Vol.2, No.3 (1970), pp.319-350. https://doi.org/10.1016/S0020-0255(70)80056-1
  18. Singh, S.S. and N.C. Chauhan, "K-means v/s K-medoids : A Comparative Study," National Conference on Recent Trends in Engineering & Technology, Vol.13(2011).
  19. Struyf, A., M. Hubert, and P. Rousseeuw, "Clustering in an Object-Oriented Environment," Journal of Statistical Software, Vol.1, No.4(1997), pp.1-30.
  20. Xu, R., J. Xu, and D.C. Wunsch, "A Comparison Study of Validity Indices on SwarmIntelligence-Based Clustering," IEEE Transactions on Systems, Man, and Cybernetics, Part B(Cybernetics), Vol.42, No.4(2012), pp.1243-1256. https://doi.org/10.1109/TSMCB.2012.2188509
  21. UCI machine learning repository datasets, http://mlr.cs.umass.edu/ml/datasets.html.