DOI QR코드

DOI QR Code

A Comparative Study of Phishing Websites Classification Based on Classifier Ensembles

  • Tama, Bayu Adhi (Dept. of IT Convergence and Application Engineering, Pukyong National University) ;
  • Rhee, Kyung-Hyune (Dept. of IT Convergence and Application Engineering, Pukyong National University)
  • Received : 2018.01.31
  • Accepted : 2018.05.14
  • Published : 2018.06.30

Abstract

Phishing website has become a crucial concern in cyber security applications. It is performed by fraudulently deceiving users with the aim of obtaining their sensitive information such as bank account information, credit card, username, and password. The threat has led to huge losses to online retailers, e-business platform, financial institutions, and to name but a few. One way to build anti-phishing detection mechanism is to construct classification algorithm based on machine learning techniques. The objective of this paper is to compare different classifier ensemble approaches, i.e. random forest, rotation forest, gradient boosted machine, and extreme gradient boosting against single classifiers, i.e. decision tree, classification and regression tree, and credal decision tree in the case of website phishing. Area under ROC curve (AUC) is employed as a performance metric, whilst statistical tests are used as baseline indicator of significance evaluation among classifiers. The paper contributes the existing literature on making a benchmark of classifier ensembles for web phishing detection.

Keywords

References

  1. A.-P. W. Group, "White paper: Phishing response trends," tech. rep., 2017.
  2. S. C. Jeeva and E. B. Rajsingh, "Intelligent phishing URL detection using association rule mining," Human-centric Computing and Information Sciences, vol. 6, no. 1, pp. 1-19, 2016. https://doi.org/10.1186/s13673-016-0060-7
  3. B. A. Tama and K. H. Rhee, "Performance analysis of multiple classifier system in DoS attack detection," in International Workshop on Information Security Applications, pp. 339-347, Springer, 2015.
  4. N. C. Oza and K. Tumer, "Classier ensembles: Select real-world applications," Information Fusion, vol. 9, no. 1, pp. 4-20, 2008. https://doi.org/10.1016/j.inffus.2007.07.002
  5. D. H. Wolpert, "The lack of a priori distinctions between learning algorithms," Neural computation, vol. 8, no. 7, pp. 1341-1390, 1996. https://doi.org/10.1162/neco.1996.8.7.1341
  6. L. Breiman, "Random forests," Machine learning, vol. 45, no. 1, pp. 5-32, 2001. https://doi.org/10.1023/A:1010933404324
  7. J. J. Rodriguez, L. I. Kuncheva, and C. J. Alonso, "Rotation forest: A new classifier ensemble method," IEEE transactions on pattern analysis and machine intelligence, vol. 28, no. 10, pp. 1619-1630, 2006. https://doi.org/10.1109/TPAMI.2006.211
  8. J. H. Friedman, "Greedy function approximation: a gradient boosting machine," Annals of statistics, pp. 1189-1232, 2001.
  9. T. Chen and C. Guestrin, "Xgboost: A scalable tree boosting system," in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785-794, ACM, 2016.
  10. J. R. Quinlan, C4.5: programs for machine learning. Elsevier, 2014.
  11. W.-Y. Loh, "Classification and regression trees," Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, vol. 1, no. 1, pp. 14-23, 2011. https://doi.org/10.1002/widm.8
  12. C. J. Mantas and J. Abellan, "Credal-C4.5: Decision tree based on imprecise probabilities to classify noisy data," Expert Systems with Applications, vol. 41, no. 10, pp. 4625-4637, 2014. https://doi.org/10.1016/j.eswa.2014.01.017
  13. R. B. Basnet, S. Mukkamala, and A. H. Sung, "Detection of phishing attacks: A machine learning approach," Soft Computing Applications in Industry, vol. 226, pp. 373-383, 2008.
  14. M. Aburrous, M. A. Hossain, K. Dahal, and F. Thabtah, "Intelligent phishing detection system for e-banking using fuzzy data mining," Expert systems with applications, vol. 37, no. 12, pp. 7913-7921, 2010. https://doi.org/10.1016/j.eswa.2010.04.044
  15. M. Lichman, "UCI machine learning repository," 2013.
  16. F. Thabtah, R. M. Mohammad, and L. McCluskey, "A dynamic self-structuring neural network model to combat phishing," in Neural Networks (IJCNN), 2016 International Joint Conference on, pp. 4221-4226, EEE, 2016.
  17. R. M. Mohammad, F. Thabtah, and L. McCluskey, "Predicting phishing websites based on self-structuring neural network," Neural Computing and Applications, vol. 25, no. 2, pp. 443-458, 2014. https://doi.org/10.1007/s00521-013-1490-z
  18. M. Dadkhah, M. Dadkhah, S. Shamshirband, S. Shamshirband, A. W. Abdul Wahab, and A. W. Abdul Wahab, "A hybrid approach for phishing web site detection," The Electronic Library, vol. 34, no. 6, pp. 927-944, 2016. https://doi.org/10.1108/EL-07-2015-0132
  19. R. M. Mohammad, F. Thabtah, and L. McCluskey, "Intelligent rule-based phishing websites classification," IET Information Security, vol. 8, no. 3, pp. 153-160, 2014. https://doi.org/10.1049/iet-ifs.2013.0202
  20. A. Hodzic, J. Kevric, and A. Karadag, "Comparison of machine learning techniques in phishing website classification," 2016.
  21. F. Thabtah and N. Abdelhamid, "Deriving correlated sets of website features for phishing detection: A computational intelligence approach," Journal of Information & Knowledge Management, vol. 15, no. 04, p. 1650042, 2016. https://doi.org/10.1142/S0219649216500428
  22. E.-S. M. El-Alfy, "Detection of phishing websites based on probabilistic neural networks and k-medoids clustering," The Computer Journal, pp. 1-5, 2017.
  23. K. D. Rajab, "New hybrid features selection method: A case study on websites phishing," Security and Communication Networks, vol. 2017, 2017.
  24. R. Quinlan, "Data mining tools See5 and C5.0," 2004.
  25. J. Abellan and S. Moral, "Building classification trees using the total uncertainty criterion," International Journal of Intelligent Systems, vol. 18, no. 12, pp. 1215-1225, 2003. https://doi.org/10.1002/int.10143
  26. J. Demsar, "Statistical comparisons of classifiers over multiple data sets," Journal of Machine learning research, vol. 7, no. Jan, pp. 1-30, 2006.