Browse > Article
http://dx.doi.org/10.9717/kmms.2020.23.11.1396

Design and Implementation of Malicious URL Prediction System based on Multiple Machine Learning Algorithms  

Kang, Hong Koo (Security Threat Response R&D Team, Korea Internet & Security Agency)
Shin, Sam Shin (Security Threat Response R&D Team, Korea Internet & Security Agency)
Kim, Dae Yeob (Security Threat Response R&D Team, Korea Internet & Security Agency)
Park, Soon Tai (Security Threat Response R&D Team, Korea Internet & Security Agency)
Publication Information
Abstract
Cyber threats such as forced personal information collection and distribution of malicious codes using malicious URLs continue to occur. In order to cope with such cyber threats, a security technologies that quickly detects malicious URLs and prevents damage are required. In a web environment, malicious URLs have various forms and are created and deleted from time to time, so there is a limit to the response as a method of detecting or filtering by signature matching. Recently, researches on detecting and predicting malicious URLs using machine learning techniques have been actively conducted. Existing studies have proposed various features and machine learning algorithms for predicting malicious URLs, but most of them are only suggesting specialized algorithms by supplementing features and preprocessing, so it is difficult to sufficiently reflect the strengths of various machine learning algorithms. In this paper, a system for predicting malicious URLs using multiple machine learning algorithms was proposed, and an experiment was performed to combine the prediction results of multiple machine learning models to increase the accuracy of predicting malicious URLs. Through experiments, it was proved that the combination of multiple models is useful in improving the prediction performance compared to a single model.
Keywords
Malicious URL; Prediction; Multiple Machine Learning; Web Threat;
Citations & Related Records
Times Cited By KSCI : 4  (Citation Analysis)
연도 인용수 순위
1 Malicious Code Hidden Site Detection Trend Report in the First Half of 2020 (2020), https://www.krcert.or.kr/data/reportView.do?bulletin_writing_sequence=35537 (accessed July 29, 2020).
2 D. Sahoo, C. Liu, and S.C.H. Hoi, "Malicious URL Detection Using Machine Learning: A Survey," arXiv Preprint arXiv:1701.07179, 2017.
3 D. Patil and J. Patil, "Malicious URLs Detection Using Decision Tree Classifiers and Majority Voting Technique," Journal of Cybernetics and Information Technologies, Vol. 18, No. 1, pp. 11-29, 2018.   DOI
4 C. Wu, M. Li, L. Ye, X. Zou, and B. Qiang, "Malicious Website Detection Based on URLs Static Features," Proceeding of International Conference on Modeling, Simulation and Optimization, pp. 307-313, 2018.
5 G. Tan, P. Zhang, Q. Liu, X. Liu, C. Zhu, F. Dou, et al., "Adaptive Malicious URL Detection: Learning in the Presence of Concept Drifts," Proceedings of 17th IEEE International Conference on Trust, Security and Privacy in Computing and Communications/12th IEEE International Conference on Big Data Science and Engineering, pp. 737-743, 2018.
6 B.A. Tama and K.H. Rhee, "A Comparative Study of Phishing Websites Classification Based on Classifier Ensembles," Journal of Korea Multimedia Society, Vol. 21, No. 5, pp. 617-625, 2018.   DOI
7 H. Musa, D.A.Y. Gital, F.U. Zambuk, A. Umar, A.Y. Umar, J.U. Waziri, et al., "A Comparative Analysis of Phishing Website Detection Using XGBoost Algorithm," Journal of Theoretical and Applied Information Technology, Vol. 97, No. 5, pp. 1434-1443, 2019.
8 Y. Zeng, Malicious URLs and Attachments Detection on Lexical-based Features using Machine Learning, Master's Thesis of Victoria University of Engineering, 2018.
9 S. Marchal, K. Saari, N. Singh, and N. Asokan, "Know Your Phish: Novel Techniques for Detecting Phishing Sites and Their Targets," Proceeding of the 36th IEEE International Conference on Distributed Computing Systems, pp. 323-333, 2016.
10 Predict Malicious Websites: XGBoost (2018), https://www.kaggle.com/craerek/predict-malicious-websites-xgboost. (accessed May 14, 2018)
11 H.K. Pao, Y.L. Chou, and Y.J. Lee, "Malicious URL Detection Based on Kolmogorov Complexity Estimation," Proceedings of the IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology, pp. 380-387, 2012.
12 J. Puchyr and M. Holena, "Random-forest-based Analysis of URL Paths," Proceeding of Conference on Information Technologies-applications and Theory, pp. 129-135, 2017.
13 H.K. Kang, S.S. Shin, D.Y. Kim, and S.T. Park, "A Study on Analysis of Feature Information for Malicious URL Prediction," Proceeding of the 21th Conference on Electronics and Information Communications, pp. 1-4, 2019.
14 H.K. Kang, S.S. Shin, D.Y. Kim, and S.T. Park, "A Study on the Design of Malicious URL Prediction System Based on Machine Learning," Proceeding of the Korea Multimedia Society Conference, pp. 13-16, 2020.
15 Y. Alshboul, R.K. Nepali, and Y. Wang, "Detecting Malicious Short URLs on Twitter," Proceeding of the 21th Americas Conference on Information Systems, pp. 1-7, 2015.
16 H.S. Kim and I.S. Kim, "A Study on Characteristic Analysis and Countermeasure of Malicious Web Site," Journal of The Korea Institute of Information Security and Cryptology, Vol. 29, No. 1, pp. 93-103, 2019.   DOI
17 ALEXA(2020), https://www.alexa.com/ (accessed March 19, 2020)
18 OpenPhish(2020), https://openphish.com/ (accessed April 19, 2020)
19 PhishTank(2020), https://phishtank.com/ (accessed April 19, 2020)