[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.9717/JMIS.2018.5.2.99

A Comparative Study of Phishing Websites Classification Based on Classifier Ensembles

Tama, Bayu Adhi (Dept. of IT Convergence and Application Engineering, Pukyong National University)
Rhee, Kyung-Hyune (Dept. of IT Convergence and Application Engineering, Pukyong National University)

Publication Information

Journal of Multimedia Information System / v.5, no.2, 2018 , pp. 99-104 More about this Journal

Abstract

Phishing website has become a crucial concern in cyber security applications. It is performed by fraudulently deceiving users with the aim of obtaining their sensitive information such as bank account information, credit card, username, and password. The threat has led to huge losses to online retailers, e-business platform, financial institutions, and to name but a few. One way to build anti-phishing detection mechanism is to construct classification algorithm based on machine learning techniques. The objective of this paper is to compare different classifier ensemble approaches, i.e. random forest, rotation forest, gradient boosted machine, and extreme gradient boosting against single classifiers, i.e. decision tree, classification and regression tree, and credal decision tree in the case of website phishing. Area under ROC curve (AUC) is employed as a performance metric, whilst statistical tests are used as baseline indicator of significance evaluation among classifiers. The paper contributes the existing literature on making a benchmark of classifier ensembles for web phishing detection.

Keywords

Phishing website; classifier ensembles; performance comparison; significance test;

Citations & Related Records

Reference

1	A.-P. W. Group, "White paper: Phishing response trends," tech. rep., 2017.
2	S. C. Jeeva and E. B. Rajsingh, "Intelligent phishing URL detection using association rule mining," Human-centric Computing and Information Sciences, vol. 6, no. 1, pp. 1-19, 2016. DOI
3	B. A. Tama and K. H. Rhee, "Performance analysis of multiple classifier system in DoS attack detection," in International Workshop on Information Security Applications, pp. 339-347, Springer, 2015.
4	N. C. Oza and K. Tumer, "Classier ensembles: Select real-world applications," Information Fusion, vol. 9, no. 1, pp. 4-20, 2008. DOI
5	J. H. Friedman, "Greedy function approximation: a gradient boosting machine," Annals of statistics, pp. 1189-1232, 2001.
6	D. H. Wolpert, "The lack of a priori distinctions between learning algorithms," Neural computation, vol. 8, no. 7, pp. 1341-1390, 1996. DOI
7	L. Breiman, "Random forests," Machine learning, vol. 45, no. 1, pp. 5-32, 2001. DOI
8	J. J. Rodriguez, L. I. Kuncheva, and C. J. Alonso, "Rotation forest: A new classifier ensemble method," IEEE transactions on pattern analysis and machine intelligence, vol. 28, no. 10, pp. 1619-1630, 2006. DOI
9	T. Chen and C. Guestrin, "Xgboost: A scalable tree boosting system," in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785-794, ACM, 2016.
10	J. R. Quinlan, C4.5: programs for machine learning. Elsevier, 2014.
11	W.-Y. Loh, "Classification and regression trees," Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, vol. 1, no. 1, pp. 14-23, 2011. DOI
12	C. J. Mantas and J. Abellan, "Credal-C4.5: Decision tree based on imprecise probabilities to classify noisy data," Expert Systems with Applications, vol. 41, no. 10, pp. 4625-4637, 2014. DOI
13	R. B. Basnet, S. Mukkamala, and A. H. Sung, "Detection of phishing attacks: A machine learning approach," Soft Computing Applications in Industry, vol. 226, pp. 373-383, 2008.
14	R. M. Mohammad, F. Thabtah, and L. McCluskey, "Intelligent rule-based phishing websites classification," IET Information Security, vol. 8, no. 3, pp. 153-160, 2014. DOI
15	M. Aburrous, M. A. Hossain, K. Dahal, and F. Thabtah, "Intelligent phishing detection system for e-banking using fuzzy data mining," Expert systems with applications, vol. 37, no. 12, pp. 7913-7921, 2010. DOI
16	M. Lichman, "UCI machine learning repository," 2013.
17	F. Thabtah, R. M. Mohammad, and L. McCluskey, "A dynamic self-structuring neural network model to combat phishing," in Neural Networks (IJCNN), 2016 International Joint Conference on, pp. 4221-4226, EEE, 2016.
18	R. M. Mohammad, F. Thabtah, and L. McCluskey, "Predicting phishing websites based on self-structuring neural network," Neural Computing and Applications, vol. 25, no. 2, pp. 443-458, 2014. DOI
19	M. Dadkhah, M. Dadkhah, S. Shamshirband, S. Shamshirband, A. W. Abdul Wahab, and A. W. Abdul Wahab, "A hybrid approach for phishing web site detection," The Electronic Library, vol. 34, no. 6, pp. 927-944, 2016. DOI
20	A. Hodzic, J. Kevric, and A. Karadag, "Comparison of machine learning techniques in phishing website classification," 2016.
21	F. Thabtah and N. Abdelhamid, "Deriving correlated sets of website features for phishing detection: A computational intelligence approach," Journal of Information & Knowledge Management, vol. 15, no. 04, p. 1650042, 2016. DOI
22	E.-S. M. El-Alfy, "Detection of phishing websites based on probabilistic neural networks and k-medoids clustering," The Computer Journal, pp. 1-5, 2017.
23	K. D. Rajab, "New hybrid features selection method: A case study on websites phishing," Security and Communication Networks, vol. 2017, 2017.
24	J. Demsar, "Statistical comparisons of classifiers over multiple data sets," Journal of Machine learning research, vol. 7, no. Jan, pp. 1-30, 2006.
25	R. Quinlan, "Data mining tools See5 and C5.0," 2004.
26	J. Abellan and S. Moral, "Building classification trees using the total uncertainty criterion," International Journal of Intelligent Systems, vol. 18, no. 12, pp. 1215-1225, 2003. DOI