DOI QR코드

DOI QR Code

Design and Implementation of the Ensemble-based Classification Model by Using k-means Clustering

  • Song, Sung-Yeol (Dept. of Computer Science and Engineering, Soongsil University) ;
  • Khil, A-Ra (Dept. of Computer Science and Engineering, Soongsil University)
  • Received : 2015.07.15
  • Accepted : 2015.09.22
  • Published : 2015.10.30

Abstract

In this paper, we propose the ensemble-based classification model which extracts just new data patterns from the streaming-data by using clustering and generates new classification models to be added to the ensemble in order to reduce the number of data labeling while it keeps the accuracy of the existing system. The proposed technique performs clustering of similar patterned data from streaming data. It performs the data labeling to each cluster at the point when a certain amount of data has been gathered. The proposed technique applies the K-NN technique to the classification model unit in order to keep the accuracy of the existing system while it uses a small amount of data. The proposed technique is efficient as using about 3% less data comparing with the existing technique as shown the simulation results for benchmarks, thereby using clustering.

Keywords

References

  1. Hebah H. O. Nasereddin, "Stream Data Mining,"International Journal of Web Applications, vol.1,no.4, pp.183-190, 2009.
  2. Kantardzic, Mehmed. Data mining: concepts, models, methods, and algorithms. John Wiley & Sons, 2011.
  3. Tsymbal, Alexey. "The problem of concept drift: definitions and related work." Computer Science Department, Trinity College Dublin 106 (2004).
  4. Wang, Haixun, et al. "Mining concept-drifting data streams using ensemble classifiers." Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2003.
  5. Kolter, Jeremy Z., and M. Maloof. "Dynamic weighted majority: A new ensemble method for tracking concept drift." Data Mining, 2003. ICDM 2003. Third IEEE International Conference on. IEEE, 2003.
  6. Brzezinski, Dariusz, and Jerzy Stefanowski. "Accuracy updated ensemble for data streams with concept drift." Hybrid Artificial Intelligent Systems. Springer Berlin Heidelberg, 2011. 155-163.
  7. Joung-Woo Ryu and Myung-Won Kim, "An Ensemble Model based on Data Distribution for Streaming Data Classification," Journal of KIISE : Database Research, vol.40, no.2, 2013, 89-98.
  8. Altman, Naomi S. "An introduction to kernel and nearest-neighbor nonparametric regression." The American Statistician 46.3 (1992): 175-185.
  9. Domeniconi, Carlotta, and Dimitrios Gunopulos. "Incremental support vector machine construction." Data Mining, 2001. ICDM 2001, Proceedings IEEE International Conference on. IEEE, 2001.
  10. Bock, Hans-Hermann. "Clustering methods: a history of k-means algorithms." Selected contributions in data analysis and classification. Springer Berlin Heidelberg, 2007. 161-172.