Browse > Article
http://dx.doi.org/10.3837/tiis.2020.07.001

Robust URL Phishing Detection Based on Deep Learning  

Al-Alyan, Abdullah (Department of Computer Science, King Saud University)
Al-Ahmadi, Saad (Department of Computer Science, King Saud University)
Publication Information
KSII Transactions on Internet and Information Systems (TIIS) / v.14, no.7, 2020 , pp. 2752-2768 More about this Journal
Abstract
Phishing websites can have devastating effects on governmental, financial, and social services, as well as on individual privacy. Currently, many phishing detection solutions are evaluated using small datasets and, thus, are prone to sampling issues, such as representing legitimate websites by only high-ranking websites, which could make their evaluation less relevant in practice. Phishing detection solutions which depend only on the URL are attractive, as they can be used in limited systems, such as with firewalls. In this paper, we present a URL-only phishing detection solution based on a convolutional neural network (CNN) model. The proposed CNN takes the URL as the input, rather than using predetermined features such as URL length. For training and evaluation, we have collected over two million URLs in a massive URL phishing detection (MUPD) dataset. We split MUPD into training, validation and testing datasets. The proposed CNN achieves approximately 96% accuracy on the testing dataset; this accuracy is achieved with URL schemes (such as HTTP and HTTPS) removed from the URL. Our proposed solution achieved better accuracy compared to an existing state-of-the-art URL-only model on a published dataset. Finally, the results of our experiment suggest keeping the CNN up-to-date for better results in practice.
Keywords
Phishing Detection; Machine Learning; Deep Learning; Convolutional Neural Network (CNN); Cyber Security;
Citations & Related Records
연도 인용수 순위
  • Reference
1 S. Raudys and A. Jain, "Small sample size effects in statistical pattern recognition: recommendations for practitioners," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 13, no. 3, pp. 252-264, 1991.   DOI
2 Zhang, Xiang, J. Zhao, and Y. LeCun. "Character-level convolutional networks for text classification," Advances in neural information processing systems, pp. 649-657, 2015.
3 Kim, Yoon, Y. Jernite, D. Sontag, and A. Rush. "Character-aware neural language models," in Proc. of Thirtieth AAAI Conference on Artificial Intelligence, 2016.
4 O. Sahingoz, E. Buber, O. Demir and B. Diri, "Machine learning based phishing detection from URLs," Expert Systems with Applications, vol. 117, pp. 345-357, 2019.   DOI
5 M. Zouina and B. Outtaj, "A novel lightweight URL phishing detection system using SVM and similarity index," Human-centric Computing and Information Sciences, vol. 7, no. 1, 2017.
6 M. Moghimi and A. Varjani, "New rule-based phishing detection method," Expert Systems with Applications, vol. 53, pp. 231-242, 2016.   DOI
7 G. Montazer and S. ArabYarmohammadi, "Detection of phishing attacks in Iranian e-banking using a fuzzy-rough hybrid system," Applied Soft Computing, vol. 35, pp. 482-492, 2015.   DOI
8 R. Ferreira et al., "Artificial Neural Network for Websites Classification with Phishing Characteristics," Social Networking, vol. 7, no. 2, pp. 97-109, 2018.   DOI
9 N. Abdelhamid, A. Ayesh and F. Thabtah, "Phishing detection based Associative Classification data mining," Expert Systems with Applications, vol. 41, no. 13, pp. 5948-5959, 2014.   DOI
10 R. Mohammad, L. McCluskey and F. Thabtah, "Intelligent rule-based phishing websites classification," IET Information Security, vol. 8, no. 3, pp. 153-160, 2014.   DOI
11 M. Babagoli, M. Aghababa and V. Solouk, "Heuristic nonlinear regression strategy for detecting phishing websites," Soft Computing, vol. 23, no. 12, pp. 4315-4327, 2018.
12 P. Yi, Y. Guan, F. Zou, Y. Yao, W. Wang and T. Zhu, "Web Phishing Detection Using a Deep Learning Framework," Wireless Communications and Mobile Computing, vol. 2018, pp. 1-9, 2018.
13 M. Adebowale, K. Lwin, E. Sanchez and M. Hossain, "Intelligent web-phishing detection and protection scheme using integrated features of Images, frames and text," Expert Systems with Applications, vol. 115, pp. 300-313, 2019.   DOI
14 Y. Li, L. Yang and J. Ding, "A minimum enclosing ball-based support vector machine approach for detection of phishing websites," Optik, vol. 127, no. 1, pp. 345-351, 2016.   DOI
15 H. Abutair, A. Belghith and S. AlAhmadi, "CBR-PDS: a case-based reasoning phishing detection system," Journal of Ambient Intelligence and Humanized Computing, vol. 10, no. 7, pp. 2593-2606, 2019.   DOI
16 "Download list of top 10 million domains based on Open data from Common Crawl & Common Search," Domcop.com, 2020. [Online]. Available: https://www.domcop.com/top-10-million-domains. [Accessed: 28- Jun- 2019].