DOI QR코드

DOI QR Code

DGA-based Botnet Detection Technology using N-gram

N-gram을 활용한 DGA 기반의 봇넷 탐지 방안

  • 정일옥 (고려대학교/정보보호학과) ;
  • 신덕하 (경희대학교/응용수학과) ;
  • 김수철 (숭실대학교/IT정책경영학과) ;
  • 이록석 (전남대학교/정보보호협동과정)
  • Received : 2022.09.30
  • Accepted : 2022.12.22
  • Published : 2022.12.31

Abstract

Recently, the widespread proliferation and high sophistication of botnets are having serious consequences not only for enterprises and users, but also for cyber warfare between countries. Therefore, research to detect botnets is steadily progressing. However, the DGA-based botnet has a high detection rate with the existing signature and statistics-based technology, but also has a high limit in the false positive rate. Therefore, in this paper, we propose a detection model using text-based n-gram to detect DGA-based botnets. Through the proposed model, the detection rate, which is the limit of the existing detection technology, can be increased and the false positive rate can also be minimized. Through experiments on large-scale domain datasets and normal domains used in various DGA botnets, it was confirmed that the performance was superior to that of the existing model. It was confirmed that the false positive rate of the proposed model is less than 2 to 4%, and the overall detection accuracy and F1 score are both 97.5%. As such, it is expected that the detection and response capabilities of DGA-based botnets will be improved through the model proposed in this paper.

최근 봇넷의 광범위한 확산과 고도의 정교함은 기업과 사용자뿐만 아니라 국가 간 사이버전에도 심각한 결과를 초래하고 있다. 이 때문에 봇넷을 탐지하고자 하는 연구는 꾸준히 되고 있다. 하지만, DGA 기반의 봇넷은 기존의 시그니처 및 통계 기반의 기술로는 탐지율은 높지만, 오탐율 또한 높은 한계가 있다. 이에 본 논문에서는 DGA 기반의 봇넷을 탐지하고자 문자 기반의 n-gram을 활용한 탐지모델을 제안한다. 제안한 모델을 통해 기존의 탐지 기술의 한계인 탐지율을 높이고 오탐율을 최소화할 수 있다. 다양한 DGA 봇넷에서 사용하는 대규모의 도메인 데이터셋과 정상 도메인에 대한 실험을 통해 기존의 모델보다 성능이 우수함을 확인하였다. 제안된 모델의 오탐율은 2~4% 미만이며 전체 탐지 정확도와 F1 점수는 모두 97.5%임을 확인하였다. 이처럼 본 논문에서 제안한 모델을 통해 DGA 기반의 봇넷에 대한 탐지 및 대응 능력이 향상될 것을 기대한다.

Keywords

Acknowledgement

본 논문은 2022년 정부(국토교통부)의 재원으로 국토교통과학기술진흥원(KAIA)의 지원을 받아 연구가 수행된 연구임(22TLRP-B152767-04, 자율협력주행 도로교통체계 통합보안시스템 운영을 위한 기술 및 제도개발)

References

  1. M. Willett, "Lessons of the SolarWinds hack. Survival", 63(2), 7-26, 2021.  https://doi.org/10.1080/00396338.2021.1906001
  2. S. T. Eun, "Cyber Warfare in the Russo-Ukrainian War: Assessment and Implications". IFANS FOCUS, 2022(16), 1-4, 2022. 
  3. 손현우, 이승진, 허원석. "러시아 우크라이나 간 사이버 전장 내 공격 유형 분석". 한국정보과학회 학술발표논문집, 2160-2162, 2022. 
  4. Y. Zhou, Q. S. Li, Q. Miao, & K. Yim, "DGA-Based Botnet Detection Using DNS Traffic". J. Internet Serv. Inf. Secur, 3(3/4), 116-123, 2013. 
  5. M. Feily, A. Shahrestani, & S. Ramadass, "A survey of botnet and botnet detection". In 2009 Third International Conference on Emerging Security Information, Systems and Technologies (pp. 268-273). IEEE, 2009. 
  6. M. Singh, M. Singh, and S. Kaur, "Issues and challenges in DNS based botnet detection: a survey," Computers & Security, vol. 86, pp. 28-52, 2019.  https://doi.org/10.1016/j.cose.2019.05.019
  7. X. D. Hoang, & X. H. Vu, "An improved model for detecting DGA botnets using random forest algorithm". Information Security Journal: A Global Perspective, 31(4), 441-450, 2022.  https://doi.org/10.1080/19393555.2021.1934198
  8. D. Tran, H. Mac, V. Tong, H. A. Tran, & L. G. Nguyen, "A LSTM based framework for handling multiclass imbalance in DGA botnet detection." Neurocomputing, 275, 2401-2413, 2018.  https://doi.org/10.1016/j.neucom.2017.11.018
  9. H. Gohiya, H .Lohiya, & K. Patidar, "A Survey of Xgboost system". Int. J. Adv. Technol. Eng. Res, 8, 25-30, 2018. 
  10. I. Ali, A. I. A. Ahmed, A. Almogren et al., "Systematic literature review on IoT-based botnet attack", IEEE Access, vol. 8, pp. 212220-212232, 2020.  https://doi.org/10.1109/ACCESS.2020.3039985
  11. M. Singh, M. Singh, and S. Kaur, "Issues and challenges in DNS based botnet detection: a survey", Computers & Security, vol. 86, pp. 28-52, 2019.  https://doi.org/10.1016/j.cose.2019.05.019
  12. D. T. Truong, & G. Cheng, "Detecting domain-flux botnet based on DNS traffic features in managed network". Security Comm. Networks 2016 (Vol. 9, pp. 2338-2347). John Wiley & Sons, 2016. 
  13. Y. Qiao, B. Zhang, W. Zhang, A. K. Sangaiah, & H. Wu, "DGA domain name classification method based on long short-term memory with attention mechanism", Applied Science, (2019(9), 4205. https://doi.org/10.3390/ app9204205, 2019. 
  14. H. Zhao, Z. Chang, G. Bao & X. Zeng, "Malicious domain names detection algorithm based on N-Gram", Journal of Computer Networks and Communications 2019, 9. Hindawi. https://doi.org/10.1155/2019/4612474, 2019. 
  15. D. P. Hostiadi, W. Wibisono & T. Ahmad, "B-corr model for bot group activity detection based on network flows traffic analysis". KSII Transactions on Internet and Information Systems, 10(2020), 4176-4197. https://doi.org/10.3837/tiis.2020.10.014 14, 2020. 
  16. Netlab 360. (n.d.). DGA Families. Available online: https://data.netlab.360.com/dga/(accessed on 10 August 2022). 
  17. DN Pedia. (n.d.). Top Alexa one million domains. CodePunch Solutions. https://dnpedia.com/tlds/topm.php (accessed on 10 August 2022). 
  18. C. Yin, Y. Zhu, S. Liu, J. Fei & H. Zhang, "An enhancing framework for botnet detection using generative adversarial networks". In 2018 International Conference on Artificial Intelligence and Big Data (ICAIBD) (pp. 228-234). IEEE, 2018.