A Comparative Study of Phishing Websites Classification Based on Classifier Ensembles

Tama, Bayu Adhi;Rhee, Kyung-Hyune;

doi:10.9717/JMIS.2018.5.2.99

Journal of Multimedia Information System

Volume 5 Issue 2
/
Pages.99-104
/
2018
/
2383-7632(eISSN)

Korea Multimedia Society (한국멀티미디어학회)

DOI QR Code

A Comparative Study of Phishing Websites Classification Based on Classifier Ensembles

Tama, Bayu Adhi (Dept. of IT Convergence and Application Engineering, Pukyong National University) ;
Rhee, Kyung-Hyune (Dept. of IT Convergence and Application Engineering, Pukyong National University)

Received : 2018.01.31
Accepted : 2018.05.14
Published : 2018.06.30

https://doi.org/10.9717/JMIS.2018.5.2.99 Citation PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

Phishing website has become a crucial concern in cyber security applications. It is performed by fraudulently deceiving users with the aim of obtaining their sensitive information such as bank account information, credit card, username, and password. The threat has led to huge losses to online retailers, e-business platform, financial institutions, and to name but a few. One way to build anti-phishing detection mechanism is to construct classification algorithm based on machine learning techniques. The objective of this paper is to compare different classifier ensemble approaches, i.e. random forest, rotation forest, gradient boosted machine, and extreme gradient boosting against single classifiers, i.e. decision tree, classification and regression tree, and credal decision tree in the case of website phishing. Area under ROC curve (AUC) is employed as a performance metric, whilst statistical tests are used as baseline indicator of significance evaluation among classifiers. The paper contributes the existing literature on making a benchmark of classifier ensembles for web phishing detection.

Keywords

References

A.-P. W. Group, "White paper: Phishing response trends," tech. rep., 2017.
S. C. Jeeva and E. B. Rajsingh, "Intelligent phishing URL detection using association rule mining," Human-centric Computing and Information Sciences, vol. 6, no. 1, pp. 1-19, 2016. https://doi.org/10.1186/s13673-016-0060-7
B. A. Tama and K. H. Rhee, "Performance analysis of multiple classifier system in DoS attack detection," in International Workshop on Information Security Applications, pp. 339-347, Springer, 2015.
N. C. Oza and K. Tumer, "Classier ensembles: Select real-world applications," Information Fusion, vol. 9, no. 1, pp. 4-20, 2008. https://doi.org/10.1016/j.inffus.2007.07.002
D. H. Wolpert, "The lack of a priori distinctions between learning algorithms," Neural computation, vol. 8, no. 7, pp. 1341-1390, 1996. https://doi.org/10.1162/neco.1996.8.7.1341
L. Breiman, "Random forests," Machine learning, vol. 45, no. 1, pp. 5-32, 2001. https://doi.org/10.1023/A:1010933404324
J. J. Rodriguez, L. I. Kuncheva, and C. J. Alonso, "Rotation forest: A new classifier ensemble method," IEEE transactions on pattern analysis and machine intelligence, vol. 28, no. 10, pp. 1619-1630, 2006. https://doi.org/10.1109/TPAMI.2006.211
J. H. Friedman, "Greedy function approximation: a gradient boosting machine," Annals of statistics, pp. 1189-1232, 2001.
T. Chen and C. Guestrin, "Xgboost: A scalable tree boosting system," in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785-794, ACM, 2016.
J. R. Quinlan, C4.5: programs for machine learning. Elsevier, 2014.
W.-Y. Loh, "Classification and regression trees," Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, vol. 1, no. 1, pp. 14-23, 2011. https://doi.org/10.1002/widm.8
C. J. Mantas and J. Abellan, "Credal-C4.5: Decision tree based on imprecise probabilities to classify noisy data," Expert Systems with Applications, vol. 41, no. 10, pp. 4625-4637, 2014. https://doi.org/10.1016/j.eswa.2014.01.017
R. B. Basnet, S. Mukkamala, and A. H. Sung, "Detection of phishing attacks: A machine learning approach," Soft Computing Applications in Industry, vol. 226, pp. 373-383, 2008.
M. Aburrous, M. A. Hossain, K. Dahal, and F. Thabtah, "Intelligent phishing detection system for e-banking using fuzzy data mining," Expert systems with applications, vol. 37, no. 12, pp. 7913-7921, 2010. https://doi.org/10.1016/j.eswa.2010.04.044
M. Lichman, "UCI machine learning repository," 2013.
F. Thabtah, R. M. Mohammad, and L. McCluskey, "A dynamic self-structuring neural network model to combat phishing," in Neural Networks (IJCNN), 2016 International Joint Conference on, pp. 4221-4226, EEE, 2016.
R. M. Mohammad, F. Thabtah, and L. McCluskey, "Predicting phishing websites based on self-structuring neural network," Neural Computing and Applications, vol. 25, no. 2, pp. 443-458, 2014. https://doi.org/10.1007/s00521-013-1490-z
M. Dadkhah, M. Dadkhah, S. Shamshirband, S. Shamshirband, A. W. Abdul Wahab, and A. W. Abdul Wahab, "A hybrid approach for phishing web site detection," The Electronic Library, vol. 34, no. 6, pp. 927-944, 2016. https://doi.org/10.1108/EL-07-2015-0132
R. M. Mohammad, F. Thabtah, and L. McCluskey, "Intelligent rule-based phishing websites classification," IET Information Security, vol. 8, no. 3, pp. 153-160, 2014. https://doi.org/10.1049/iet-ifs.2013.0202
A. Hodzic, J. Kevric, and A. Karadag, "Comparison of machine learning techniques in phishing website classification," 2016.
F. Thabtah and N. Abdelhamid, "Deriving correlated sets of website features for phishing detection: A computational intelligence approach," Journal of Information & Knowledge Management, vol. 15, no. 04, p. 1650042, 2016. https://doi.org/10.1142/S0219649216500428
E.-S. M. El-Alfy, "Detection of phishing websites based on probabilistic neural networks and k-medoids clustering," The Computer Journal, pp. 1-5, 2017.
K. D. Rajab, "New hybrid features selection method: A case study on websites phishing," Security and Communication Networks, vol. 2017, 2017.
R. Quinlan, "Data mining tools See5 and C5.0," 2004.
J. Abellan and S. Moral, "Building classification trees using the total uncertainty criterion," International Journal of Intelligent Systems, vol. 18, no. 12, pp. 1215-1225, 2003. https://doi.org/10.1002/int.10143
J. Demsar, "Statistical comparisons of classifiers over multiple data sets," Journal of Machine learning research, vol. 7, no. Jan, pp. 1-30, 2006.

Journal of Multimedia Information System

A Comparative Study of Phishing Websites Classification Based on Classifier Ensembles

Abstract

Keywords

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)