[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.3837/tiis.2019.11.017

MALICIOUS URL RECOGNITION AND DETECTION USING ATTENTION-BASED CNN-LSTM

Peng, Yongfang (School of Software, Xinjiang University)
Tian, Shengwei (School of Software, Xinjiang University)
Yu, Long (Network Center, Xinjiang University)
Lv, Yalong (College of Information Science and Engineering, Xinjiang University)
Wang, Ruijin (School of Computer Science and Engineering, University of Electronic Science and Technology of China)

Publication Information

KSII Transactions on Internet and Information Systems (TIIS) / v.13, no.11, 2019 , pp. 5580-5593 More about this Journal

Abstract

A malicious Uniform Resource Locator (URL) recognition and detection method based on the combination of Attention mechanism with Convolutional Neural Network and Long Short-Term Memory Network (Attention-Based CNN-LSTM), is proposed. Firstly, the WHOIS check method is used to extract and filter features, including the URL texture information, the URL string statistical information of attributes and the WHOIS information, and the features are subsequently encoded and pre-processed followed by inputting them to the constructed Convolutional Neural Network (CNN) convolution layer to extract local features. Secondly, in accordance with the weights from the Attention mechanism, the generated local features are input into the Long-Short Term Memory (LSTM) model, and subsequently pooled to calculate the global features of the URLs. Finally, the URLs are detected and classified by the SoftMax function using global features. The results demonstrate that compared with the existing methods, the Attention-based CNN-LSTM mechanism has higher accuracy for malicious URL detection.

Keywords

Malicious URL; Recognition and Detection; Attention-Based CNN-LSTM; Deep Learning;

Citations & Related Records

Reference

1	Yue Zhang, Jason Hong, Lorrie Cranor, "Cantina: A Content-Based Approach to Detecting Phishing WebSites," in Proc. of International Conference on World Wide Web, WWW 2007, Banff, Alberta, Canada, May. DBLP, 639-648, 2007.
2	Mahmoud Khonji, Youssef Iraqi, Andrew Jones, "Phishing Detection: A Literature Survey," IEEE Communications Surveys & Tutorials, 15(4), 2091-2121, 2013. DOI
3	Lance Spitzner, Honeypots: tracking hackers, Hacker, Boston, MA, USA, 2003.
4	Jiuxin Cao, Bo Mao, Junzhou Luo, Bo Liu, "A Phishing web Pages Detection Algorithm Based on Nested Structure of Earth Mover's Distance," Chinese Journal of Computers, 32(5), 922-929, 2009. DOI
5	Shouxu Jiang, Jianzhong Li, "A Reputation-based Trust Mechanism for P2P E-commerce Systems," Journal of Software, 2007, 18(10), 2551-2563, 2007. DOI
6	Hongzhou Sha, Qingyun Liu, Tingwen Liu, Zhou Zhou, Li Guo, Binxing Fang, "Survey on Malicious Webpage Detection Research," Chinese Journal of Computers, 39(3), 529-542, 2016.
7	Sahoo D, Liu C, Hoi S C H, "Malicious URL Detection using Machine Learning: A Survey," 2017.
8	Pawan Prakash, Manish Kumar, Ramana Kompella, Minaxi Gupta, "Phishnet: predictive blacklisting to detect phishing attacks," in Proc. of 2010 Proceedings IEEE INFOCOM, 1-5, 2010.
9	Dharmaraj R Patil, Jayantrao Patil, "Survey on Malicious Web Pages Detection Techniques," International Journal of u- and e- Service, Science and Technology, vol. 8, no. 5, pp. 195-206, 2015. DOI
10	Sujata Garera, Niels Provos, Monica Chew, Aviel D. Rubin, "A framework for detection and measurement of phishing attacks," in Proc. of the 2007 ACM workshop on Recurring malcode. ACM, pp. 1-8, 2007.
11	Mahmoud Khonji, Youssef Iraqi, Andy Jones, "Phishing Detection: A Literature Survey," IEEE Communications Surveys & Tutorials, vol. 15, no. 4, pp. 2091-2121, 2013. DOI
12	Raj Nepali, Yong Wang, "You Look Suspicious!!: Leveraging Visible Attributes to Classify Malicious Short URLs on Twitter," in Proc. of 2016 49thHawaii International Conference on System Sciences (HICSS). IEEE, pp. 2648-2655, 2016.
13	Masahiro Kuyama, Yoshio Kakizaki, Ryoichi Sasaki, "Method for Detecting a Malicious Domain by Using WHOIS and DNS Features," in Proc. of The Third International Conference on Digital Security and Forensics (Digital Sec2016), pp. 74-80, 2016.
14	Liu G, Qiu B, Liu W, "Automatic Detection of Phishing Target from Phishing Webpage," in Proc. of International Conference on Pattern Recognition. IEEE Computer Society, 4153-4156, 2010.
15	Ming Sun, Anirudh Raju, George Tucker, Sankaran Panchapagesan, Gengshen Fu, "Max-pooling loss training of long short-term memory networks for small-footprint keyword spotting," in Proc. of Spoken Language Technology Workshop. IEEE, 474-480, 2017.
16	Ma J, Saul LK, Savage S, GM Voelker, "Beyond blacklists: learning to detect malicious web sites from suspicious URLs," in Proc. of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Paris, France, June 28 - July. DBLP, 1245-1254, 2009.
17	Ma J, Saul L K, Savage S, GM Voelker, "Identifying suspicious URLs: an application of large-scale online learning," in Proc. of International Conference on Machine Learning. ACM, 681-688, 2009.
18	Ma J, Saul L K, Savage S, GM Voelker, "Learning to detect malicious URLs," Acm Transactions on Intelligent Systems & Technology, 2(3), 1-24, 2011.
19	Xuejian Wang, Lantao Yu, Kan Ren, Guanyu Tao, Weinan Zhang, Yong Yu, Jun Wang, "Dynamic Attention Deep Model for Article Recommendation by Learning Human Editors' Demonstration," in Proc. of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2051-2059, 2017.
20	Mnih V, Heess N, GravesA, K Kavukcuoglu, "Recurrent models of visual attention," in Proc. of NIPS'14 Proceedings of the 27th International Conference on Neural Information Processing Systems, 2204-2212, 2014.
21	Bahdanau D, Cho K, Bengio Y, "Neural Machine Translation by Jointly Learning to Align and Translate," Computer Science, 2014.
22	Bulo S R, Neuhold G, Kontschieder P, "Loss Max-Pooling for Semantic Image Segmentation," in Proc. of 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
23	Xiaoguang Han, Wu Qu, Xuanxia Yao, Changyou Guo, Fang Zhou, "Research on malicious code variantsdetection based on texture fingerprint," Journal on Communications, 35(8), 16-136, 2014.
24	Cao J, Li Q, Ji Y, et al., "Detection of Forwarding-Based Malicious URLs in Online Social Networks," International Journal of Parallel Programming, 44(1), 163-180, 2016. DOI
25	Shiqi Luo, Shengwei Tian, Long Yu, Jiong Yu, Hua Sun, "Detection on Android malware analysis based on Malware Image Fingerprint and Malware Activity Embedding in Vector Space," Journal of Computer Applications, 38(4), 1058-1063, 2018.
26	https://url.spec.whatwg.org/[EB/OL], 2018
27	Hailun Lin, Wei Li, Weiping Wang, Yinliang Yue, Zheng Lin, "Efficient segment pattern based method for malicious URL detection," Journal on Communications, 36(s1),141-148, 2015.
28	Shi Y, Chen G, Li J, "Malicious Domain Name Detection Based on Extreme Machine Learning," Neural Processing Letters, vol. 48, no. 3, pp. 1347-1357, 2018. DOI