DOI QR코드

DOI QR Code

Hybrid phishing site detection system with GRU-based shortened URL determination technique

GRU 기반 단축 URL 판별 기법을 적용한 하이브리드 피싱 사이트 탐지 시스템

  • Hae-Soo Kim (School. of Computer Engineering & Applied Mathematics, Computer System Institute, Hankyong National University) ;
  • Mi-Hui Kim (School. of Computer Engineering & Applied Mathematics, Computer System Institute, Hankyong National University)
  • Received : 2023.06.03
  • Accepted : 2023.07.20
  • Published : 2023.09.30

Abstract

According to statistics from the National Police Agency, smishing crimes using texts or messengers have increased dramatically since COVID-19. In addition, most of the cases of impersonation of public institutions reported to agency were related to vaccination and reward, and many methods were used to trick people into clicking on fake URLs (Uniform Resource Locators). When detecting them, URL-based detection methods cannot detect them properly if the information of the URL is hidden, and content-based detection methods are slow and use a lot of resources. In this paper, we propose a system for URL-based detection using transformer for regular URLs and content-based detection using XGBoost for shortened URLs through the process of determining shortened URLs using GRU(Gated Recurrent Units). The F1-Score of the proposed detection system was 94.86, and its average processing time was 5.4 seconds.

경찰청 통계자료에 따르면 코로나19 이후 문자 또는 메신저를 이용한 스미싱(Smishing) 범죄가 급증하였다. 또한 정부 기관에 접수된 공공기관 사칭 건수의 대부분이 백신접종 및 보상 관련하여 가짜 URL(Uniform Resource Locator)을 클릭하도록 유도하는 수법이 다수 사용되었다. 주로 URL의 정보를 숨긴 단축 URL을 사용하며 탐지할 때 URL 기반 탐지방법은 URL의 정보를 숨기면 제대로 탐지할 수 없고, 콘텐츠 기반 탐지 방법은 탐지 속도가 느리고 많은 자원을 사용한다. 이에 본 논문에서는 GRU(Gated Recurrent Units)를 이용한 단축 URL을 판별하는 과정을 통해 일반 URL일 때 transformer를 통한 URL 기반 탐지, 단축 URL일때 XGBoost를 이용한 콘텐츠 기반 탐지하는 시스템을 제안한다. 제안한 탐지 시스템의 F1-Score는 94.86이었고, 처리시간은 평균 5.4초가 소요되었다.

Keywords

Acknowledgement

This work was supported by a research grant from Hankyong National University for an academic exchange program in 2023.

References

  1. Korean National Police Agency, "Cyber Criminal Trend for 2022," https://www.police.go.kr/user/bbs/BD_selectBbs.do?q_bbsCode=1001&q_bbscttSn=20220519141449594.
  2. Y. Kim, H. Kim and M. Kim, "Short URLs Verification Approach for Phishing Site Detection Improvement," Proceedings of the Annual Conference of Korea Information Processing Society Conference 2022 (ACK 2022), pp.80-81, 2022.
  3. J. Lamas Pineiro and L. Wong Portillo, "Web architecture for URL-based phishing detection based on Random Forest, Classification Trees, and Support Vector Machine," Inteligencia Artificial, vol.25, no.69, pp.107-121, 2022. DOI: 10.4114/intartif.vol25iss69pp107-121
  4. U. Ozker and O. K. Sahingoz, "Content Based Phishing Detection with Machine Learning," 2020 International Conference on Electrical Engineering (ICEE), pp.1-6, 2020. DOI: 10.1109/ICEE49691.2020.9249892
  5. Das Guptta, S., Shahriar, K. T., Alqahtani, H. et al., "Modeling Hybrid Feature-Based Phishing Websites Detection Using Machine Learning Techniques," Annals of Data Science, 2022. DOI: 10.1007/s40745-022-00379-8
  6. Korkmaz, M. ., Kocyigit, E. ., Sahingoz, O. K., & Diri, B., "A Hybrid Phishing Detection System Using Deep Learning-based URL and Content Analysis," Elektronika Ir Elektrotechnika, vol.28, no.5, pp.80-89, 2022. DOI: 10.5755/j02.eie.31197
  7. S. Hochreiter and J. Schmidhuber, "LONG SHORT-TERM MEMORY," Neural Computation, vol.9, no.8, pp.1735-1780, 1997.
  8. J. Chung, C. Gulcehre, K. Cho and Y. Bengio, "Empirical evaluation of gated recurrent neural networks on sequence modeling," 2014. [online] Available: http://arxiv.org/abs/1412.3555.
  9. Jordan. M. I, "Serial order: a parallel distributed processing approach," Tech. rep. ICS 8604, 1986. DOI: 10.1016/S0166-4115(97)80111-2
  10. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, et al., "Attention is all you need," Advances in neural information processing systems 30, 2017. DOI: 10.48550/arXiv.1706.03762
  11. J. H. Friedman, "Greedy function approximation: A gradient boostingmachine," Ann. Statist, vol.29, no.5, pp.1189-1232, 2001. DOI: 10.1214/aos/1013203451
  12. dmlc XGBoost, "XGBoost Documentation," https://xgboost.readthedocs.io/en/stable/
  13. A. Hannousse, S. Yahiouche, "Web page phishing detection," https://data.mendeley.com/datasets/c2gw7fy2j4/3