Browse > Article
http://dx.doi.org/10.6109/jkiice.2022.26.12.1786

Development of a Malicious URL Machine Learning Detection Model Reflecting the Main Feature of URLs  

Kim, Youngjun (Department of Convergence Security, Chung-Ang University)
Lee, Jaewoo (Department of Industrial Security, Chung-Ang University)
Abstract
Cyber-attacks such as smishing and hacking mail exploiting COVID-19, political and social issues, have recently been continuous. Machine learning and deep learning technology research are conducted to prevent any damage due to cyber-attacks inducing malicious links to breach personal data. It has been concluded as a lack of basis to judge the attacks to be malicious in previous studies since the features of data set were excessively simple. In this paper, nine main features of three types, "URL Days", "URL Word", and "URL Abnormal", were proposed in addition to lexical features of URL which have been reflected in previous research. F1-Score and accuracy index were measured through four different types of machine learning algorithms. An improvement of 0.9% in a result and the highest value, 98.5%, were examined in F1-Score and accuracy through comparatively analyzing an existing research. These outcomes proved the main features contribute to elevating the values in both accuracy and performance.
Keywords
Malicious URL; Phsing URL; Machine learning; Detection Model;
Citations & Related Records
Times Cited By KSCI : 1  (Citation Analysis)
연도 인용수 순위
1 Phishing websites provided by OpenPhish [Internet]. Available: https://openphish.com/.
2 J. S. Park, "Based on URL pattern analysis Preventive measures against harmful sites," M. S. thesis, Konkuk University, 2019.
3 The Internet Society, "Rfc3986: Uniform resource identifier (uri): Generic syntax," 2005. [Online]. Available: https://tools.ietf.org/html/rfc3986.
4 N. S. Kim, "Ministry of Science and ICT, '21 cyber threat analysis and '22 viewpoint analysis," Ministry of Science and ICT, 2021. [Internet]. Available: https://doc.msit.go.kr/SynapDocViewServer/viewer/doc.html?key=7d38743144ff45fb8688b4f2255dfc13&convType=html&convLocale=ko_KR&contextPath=/SynapDocViewServer/.
5 Y. B. Kwon and I. S. Kim, "A Study on Anomaly Signal Detection and Management Model using Big Data," The Journal of The Institute of Internet, Broadcasting and Communication, vol. 16, no. 6, pp. 287-294, Dec. 2016.   DOI
6 S. G. Lee, D. W. Kim, B. J. Kim, T. W. Lee, S. W. Han, and J. K. Lee, "Comprehensive Analysis Strategy in Cyber Threat Intelligence Environment," Review of KIISC, vol. 31, no. 5, pp. 33-38, Oct. 2021.
7 J. K. Kim, M. H. Jang, S. N. Lim, and M. S. Kim, "A Study on the Detection Method of Malicious URLs based on the Internet Search Engines using the Machine Learning," The Transactions of The Korean Institute of Electrical Engineers, vol. 70, no. 1, pp. 114-120, Jan. 2021.
8 H. K. Kang, S. S. Shin, D. Y. Kim, and S. T. Park, "Design and Implementation of Malicious URL Prediction System based on Multiple Machine Learning Algorithms," Journal of Korea Multimedia Society, vol. 23, no. 11, pp. 1396-1405, Nov. 2020.   DOI
9 Y. Chen, Y. Zhou, Q. Dong, and Q. Li, "A Malicious URL Detection Method Based on CNN," in Proceedings of 2020 IEEE Conference on Telecommunications, Optics and Computer Science (TOCS), Shenyang, China, pp. 23-28, 2020.
10 University of new brunswick ISCX-URL2016 URL dataset [Internet]. Available: https://www.unb.ca/cic/datasets/url2016.html.
11 Malicious URLs provided by URLhaus [Internet]. Available: https://urlhaus.abuse.ch/.
12 C. M. Kwon, Python Machine Learning Perfect Guide, Gyeonggi, Korea, Wikibook, 2019.
13 Spotting and blacklisting malicious COVID-19-themed sites [Internet]. Available: https://www.helpnetsecurity.com/2020/04/07/covid-19-malicious-sites/.
14 Leading the domestic security market with AI technology [Internet]. Available: http://www.itdaily.kr/news/articleView.html?idxno=206661.
15 A. Hevapathige and K. Rathnayake, "Super Learner for Malicious URL Detection," in Proceedings of 2022 2nd International Conference on Advanced Research in Computing (ICARC), Belihuloya, Sri Lanka, pp. 114-119, 2022.
16 Phishing URLs provided by Phishing Tank [Internet]. Available: http://data.phishtank.com/data/online-valid.csv.
17 Multinational Open Content Directory on World Wide Web Links by DMOZ [Internet]. Available: https://www.dmoz-odp.org.