DOI QR코드

DOI QR Code

Combined Artificial Bee Colony for Data Clustering

융합 인공벌군집 데이터 클러스터링 방법

  • Kang, Bum-Su (Department of System & Management Engineering, Kangwon National University) ;
  • Kim, Sung-Soo (Department of System & Management Engineering, Kangwon National University)
  • 강범수 (강원대학교 시스템경영공학과) ;
  • 김성수 (강원대학교 시스템경영공학과)
  • Received : 2017.10.12
  • Accepted : 2017.12.13
  • Published : 2017.12.31

Abstract

Data clustering is one of the most difficult and challenging problems and can be formally considered as a particular kind of NP-hard grouping problems. The K-means algorithm is one of the most popular and widely used clustering method because it is easy to implement and very efficient. However, it has high possibility to trap in local optimum and high variation of solutions with different initials for the large data set. Therefore, we need study efficient computational intelligence method to find the global optimal solution in data clustering problem within limited computational time. The objective of this paper is to propose a combined artificial bee colony (CABC) with K-means for initialization and finalization to find optimal solution that is effective on data clustering optimization problem. The artificial bee colony (ABC) is an algorithm motivated by the intelligent behavior exhibited by honeybees when searching for food. The performance of ABC is better than or similar to other population-based algorithms with the added advantage of employing fewer control parameters. Our proposed CABC method is able to provide near optimal solution within reasonable time to balance the converged and diversified searches. In this paper, the experiment and analysis of clustering problems demonstrate that CABC is a competitive approach comparing to previous partitioning approaches in satisfactory results with respect to solution quality. We validate the performance of CABC using Iris, Wine, Glass, Vowel, and Cloud UCI machine learning repository datasets comparing to previous studies by experiment and analysis. Our proposed KABCK (K-means+ABC+K-means) is better than ABCK (ABC+K-means), KABC (K-means+ABC), ABC, and K-means in our simulations.

Keywords

References

  1. Assuncao, M.D., Calheiros, R.N., Bianchi, S., Netto, M.A., and Buyya, R., Big Data computing and clouds : Trends and future directions, Journal of Parallel and Distributed Computing, 2015, Vol. 79, pp. 3-15.
  2. Fahad, A., Alshatri, N., Tari, Z., Alamri, A., Khalil, I., Zomaya, A.Y., Foufou, S., and Bouras, A., A survey of clustering algorithms for big data : Taxonomy and empirical analysis, IEEE transactions on emerging topics in computing, 2014, Vol. 2, No. 3, pp. 267-279. https://doi.org/10.1109/TETC.2014.2330519
  3. Gungor, Z. and Unler, A., K-harmonic means data clustering with simulated annealing heuristic, Applied Mathematics and Computation, 2007, Vol. 184, No. 2, pp. 199-209. https://doi.org/10.1016/j.amc.2006.05.166
  4. Hruschka, E.R., Campello, R.J., and Freitas, A.A., A survey of evolutionary algorithms for clustering, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 2009, Vol. 39, No. 2, pp. 133-155. https://doi.org/10.1109/TSMCC.2008.2007252
  5. Jeon, S.Y., Lee, D.H., and Bae, M.J., A study on the Application Method of Munition's Quality Information based on Big Data, Journal of the Korea Academia- Industrial cooperation Society, 2016, Vol. 17, No. 6, pp. 315-325. https://doi.org/10.5762/KAIS.2016.17.6.315
  6. Karaboga, D. and Ozturk, C., A novel clustering approach : Artificial Bee Colony (ABC) algorithm, Applied soft computing, 2011, Vol. 11, No. 1, pp. 652-657. https://doi.org/10.1016/j.asoc.2009.12.025
  7. Kao, Y.T., Zahara, E., and Kao, I.W., A hybridized approach to data clustering, Expert Systems with Applications, 2008, Vol. 34, No. 3, pp. 1754-1762. https://doi.org/10.1016/j.eswa.2007.01.028
  8. Kim, S.S., Baek, J.Y., and Kang, B.S., Hybrid Simulated Annealing for Data Clustering, Journal of Society of Korea Industrial and Systems Engineering, 2017, Vol. 40, No. 2, pp. 92-98. https://doi.org/10.11627/jkise.2017.40.2.092
  9. Kim, S.S. and Byeon, J.H., Cell Grouping Design for Wireless Network using Artificial Bee Colony, Journal of Society of Korea Industrial and Systems Engineering, 2016, Vol. 39, No. 2, pp. 46-53. https://doi.org/10.11627/jkise.2016.39.2.046
  10. Krishna, K. and Murty, M.N., Genetic K-means algorithm, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 1999, Vol. 29, No. 3, pp. 433-439. https://doi.org/10.1109/3477.764879
  11. Kumar, Y. and Sahoo, G., A two-step artificial bee colony algorithm for clustering, Neural Computing and Applications, 2017, Vol. 28, No. 3, pp. 537-551. https://doi.org/10.1007/s00521-015-2095-5
  12. Maulik, U. and Bandyopadhyay, S., Genetic algorithmbased clustering technique, Pattern recognition, 2000, Vol. 33, Issue. 9, pp. 1455-1465. https://doi.org/10.1016/S0031-3203(99)00137-5
  13. Perim, G., Wandekokem, E., and Varejao, F., K-Means Initialization Methods for Improving Clustering by Simulated Annealing, 11th Ibero-American Conference on AI, 2008, Lisbon, Vol. 5290, pp. 133-142.
  14. Reisi, M., Moradi, P., and Abdollahpouri, A., A feature weighting based artificial bee colony algorithm for data clustering, In Information and Knowledge Technology (IKT), 2016 Eighth International Conference on, 2016, Hamedan, Iran, pp. 134-138.
  15. Selim, S.Z. and Alsultan, K., A simulated annealing algorithm for the clustering problem, Pattern recognition, 1991, Vol. 24, No. 10, pp. 1003-1008. https://doi.org/10.1016/0031-3203(91)90097-O
  16. Singh, S.S. and Chauhan, N.C., K-means v/s K-medoids: A Comparative Study, National Conference on Recent Trends in Engineering & Technology, 2011, Vol. 13.
  17. Sithara, E.P. and Nazeer, K.A.A, A Hybrid K Harmonic Means with ABC Clustering Algorithm using an Optimal K value for High Performance Clustering, International Journal on Cybernetics & Informatics, 2016, Vol. 5, No. 2.
  18. Sun, L.X., Xu, F., Liang, Y.Z., Xie, Y.L., and Yu, R.Q., Cluster analysis by the K-means algorithm and simulated annealing, Chemometrics and intelligent laboratory systems, 1994, Vol. 25, No. 1, pp. 51-60. https://doi.org/10.1016/0169-7439(94)00049-2
  19. Tran, D.C., Wu, Z., Wang, Z., and Deng, C., A Novel Hybrid Data Clustering Algorithm Based on Artificial Bee Colony Algorithm and K-Mean, Chinese Journal of Electronics, 2015, Vol. 24, No. 4, pp. 694-701. https://doi.org/10.1049/cje.2015.10.006
  20. UCI machine learning repository Cloud datasets, https://archive.ics.uci.edu/ml/datasets/cloud.
  21. UCI machine learning repository Glass datasets, https://archive.ics.uci.edu/ml/datasets/glass.
  22. UCI machine learning repository Iris datasets, https://archive.ics.uci.edu/ml/datasets/iris.
  23. UCI machine learning repository Vowel datasets, https://archive.ics.uci.edu/ml/datasets/vowel.
  24. UCI machine learning repository Wine datasets, https://archive.ics.uci.edu/ml/datasets/wine.
  25. Van der Merwe, D.W. and Engelbrecht, A.P., Data clustering using particle swarm optimization, In Evolutionary Computation, 2003, CEC'03. The 2003 Congress on, IEEE, 2003, Vol. 1, pp. 215-220.
  26. Xu, R., Xu, J., and Wunsch, D.C., A comparison study of validity indices on swarm-intelligence-based clustering, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 2012, Vol. 42, No. 4, pp. 1243-1256. https://doi.org/10.1109/TSMCB.2012.2188509
  27. Yan, X., Zhu, Y., Zou, W., and Wang, L., A new approach for data clustering using hybrid artificial bee colony algorithm, Neurocomputing, 2012, Vol. 97, pp. 241-250. https://doi.org/10.1016/j.neucom.2012.04.025
  28. Zhang, C., Ouyang, D., and Ning, J., An artificial bee colony approach for clustering, Expert Systems with Applications, 2010, Vol. 37, No. 7, pp. 4761-4767. https://doi.org/10.1016/j.eswa.2009.11.003

Cited by

  1. 빠른 클러스터 개수 선정을 통한 효율적인 데이터 클러스터링 방법 vol.41, pp.2, 2017, https://doi.org/10.11627/jkise.2018.41.2.001
  2. 가우시안 기반 Hyper-Rectangle 생성을 이용한 효율적 단일 분류기 vol.41, pp.2, 2017, https://doi.org/10.11627/jkise.2018.41.2.056