DOI QR코드

DOI QR Code

A New Ensemble Machine Learning Technique with Multiple Stacking

다중 스태킹을 가진 새로운 앙상블 학습 기법

  • Lee, Su-eun (School of Electrical and Computer Engineering, University of Seoul) ;
  • Kim, Han-joon (School of Professor Electrical and Computer Engineering, University of Seoul)
  • Received : 2020.05.28
  • Accepted : 2020.07.28
  • Published : 2020.08.31

Abstract

Machine learning refers to a model generation technique that can solve specific problems from the generalization process for given data. In order to generate a high performance model, high quality training data and learning algorithms for generalization process should be prepared. As one way of improving the performance of model to be learned, the Ensemble technique generates multiple models rather than a single model, which includes bagging, boosting, and stacking learning techniques. This paper proposes a new Ensemble technique with multiple stacking that outperforms the conventional stacking technique. The learning structure of multiple stacking ensemble technique is similar to the structure of deep learning, in which each layer is composed of a combination of stacking models, and the number of layers get increased so as to minimize the misclassification rate of each layer. Through experiments using four types of datasets, we have showed that the proposed method outperforms the exiting ones.

기계학습(machine learning)이란 주어진 데이터에 대한 일반화 과정으로부터 특정 문제를 해결할 수 있는 모델(model) 생성 기술을 의미한다. 우수한 성능의 모델을 생성하기 위해서는 양질의 학습데이터와 일반화 과정을 위한 학습 알고리즘이 준비되어야 한다. 성능 개선을 위한 한 가지 방법으로서 앙상블(Ensemble) 기법은 단일 모델(single model)을 생성하기보다 다중 모델을 생성하며, 이는 배깅(Bagging), 부스팅(Boosting), 스태킹(Stacking) 학습 기법을 포함한다. 본 논문은 기존 스태킹 기법을 개선한 다중 스태킹 앙상블(Multiple Stacking Ensemble) 학습 기법을 제안한다. 다중 스태킹 앙상블 기법의 학습 구조는 딥러닝 구조와 유사하고 각 레이어가 스태킹 모델의 조합으로 구성되며 계층의 수를 증가시켜 각 계층의 오분류율을 최소화하여 성능을 개선한다. 4가지 유형의 데이터셋을 이용한 실험을 통해 제안 기법이 기존 기법에 비해 분류 성능이 우수함을 보인다.

Keywords

References

  1. Breiman, L., "Bagging predictors," Machine Learning, Vol. 24, No. 2, pp. 123-140, 1996. https://doi.org/10.1007/BF00058655
  2. Brown, G., "Ensemble Learning," Encyclopedia of Machine Learning, Vol. 312, pp. 15-19, 2010.
  3. Cutler, D. R., Edwards, T. C., Beard, K. H., Cutler, A., Hess, K. T., Joshua, J. G., and Lawler, J. J., "Random forests for classification in ecology," Ecology, Vol. 88, No. 11, pp. 2783-2792, 2017. https://doi.org/10.1890/07-0539.1
  4. Demir, N., and Dalkilic, G., "Modi ed stacking ensemble approach to detect network intrusion," Turkish Journal of Electrical Engineering & Computer Sciences, Vol. 26, No. 1 pp. 418-433, 2018. https://doi.org/10.3906/elk-1702-279
  5. El-Khatib, M. J., Abu-Naser, B. S., and Abu-Naser, S. S., "Glass classification using artificial neural network," International Journal of Academic Pedagogical Research (IJAPR), Vol. 3, No. 2, pp. 25-31, 2019.
  6. Garrett, D., Peterson, D. A., Anderson, C. W., and Thaut, M. H., "Comparison of linear, nonlinear, and feature selection methods for EEG signal classification," IEEE Transactions on Neural Systems and Rehabilitation Engineering, Vol. 11, No. 2, pp. 141-144, 2003. https://doi.org/10.1109/TNSRE.2003.814441
  7. Goyal, M., and Rajapakse, J. C., "Deep neural network ensemble by data augmentation and bagging for skin lesion classification," Computer Vision and Pattern Recognition, Vol. 1807.05496, 2018.
  8. Kaggle Datase, Titanic Data. https://www.kaggle.com/c/titanic/data.
  9. Kaggle DataSets, https://www.kaggle.com/datasets.
  10. Kim, Y. J., Choi, Y. L., Kim, S. L., Park, K. Y., and Park, J. H., "A study on method for user gender prediction using multi-modal smart device log data," The Journal of Society for e-Business Studies, Vol. 21, No. 1, pp. 147-163, 2016. https://doi.org/10.7838/jsebs.2016.21.1.147
  11. Pari, R., Sandhya, M., and Sankar, S., "A multitier stacked ensemble algorithm for improving classification accuracy," Computing in Science & Engineering, Vol. 22, No. 4, pp. 74-85, 2020. https://doi.org/10.1109/MCSE.2018.2873940
  12. Patil, T. R., and Sherekar, S. S., "Performance analysis of Naive Bayes and J48 classification algorithm for data classification," International journal of computer science and applications, Vol. 6, No. 2, pp. 256-261, 2013.
  13. Ramamurthy, M., and Krishnamurthi, I., "Decision tree based classification type question/answer e-assessment system," Advances in Natural and Applied Sciences, Vol. 10, No. 1 pp. 22-26, 2016.
  14. Schapire, R. E., Freund, Y., Bartlett, P., and Lee, W. S., "Boosting the margin: A new explanation for the effectiveness of voting methods," The annals of statistics, Vol. 26, No. 5 pp. 1651-1686, 1998. https://doi.org/10.1214/aos/1024691352
  15. Syarif, I., Zaluska, E., Prugel-Bennett, A., and Wills, G., "Application of bagging, boosting and stacking to intrusion detection," International Workshop on Machine Learning and Data Mining in Pattern Recognition, Vol. 7376, No. 8, pp. 593-602, 2012.
  16. UCI Repository Adult Data, http://archive.ics.uci.edu/ml/datasets/Adult.
  17. UCI Repository Cervical Data, http://archive.ics.uci.edu/ml/datasets/Cervical+canc er+%28Risk+Factors%29.
  18. UCI Repository German Data, http://archive.ics.uci.edu/ml/datasets/statlog+(german+credit+data).
  19. UCI Repository, http://archive.ics.uci.edu/ml/datasets.php.
  20. Yang, X., Lo, D., Xia, X., and Sun, J., "TLEL: A two-layer ensemble learning approach for just-in-time defect prediction," Information and Software Technology, Vol. 87, No. 1 pp. 206-220, 2017. https://doi.org/10.1016/j.infsof.2017.03.007
  21. Zhang, S., Li, X., Zong, M., Zhu, X. and Wang, R., "Efficient knn classification with different numbers of nearest neighbors," IEEE Transactions on Neural Networks and Learning Systems, Vol. 29, No. 5, pp. 1774-1785, 2018. https://doi.org/10.1109/TNNLS.2017.2673241
  22. Zhou, Z. H., "Ensemble methods: Foundations and algorithms," Chapman and Hall/CRC, 2012.