DOI QR코드

DOI QR Code

A study on the improvement ransomware detection performance using combine sampling methods

혼합샘플링 기법을 사용한 랜섬웨어탐지 성능향상에 관한 연구

  • 김수철 (숭실대학교/IT정책경영학과 ) ;
  • 이형동 (숭실대학교/IT정책경영학과 ) ;
  • 변경근 (숭실대학교/IT정책경영학과 ) ;
  • 신용태 (숭실대학교/IT정책경영학과 )
  • Received : 2023.02.28
  • Accepted : 2023.03.31
  • Published : 2023.03.31

Abstract

Recently, ransomware damage has been increasing rapidly around the world, including Irish health authorities and U.S. oil pipelines, and is causing damage to all sectors of society. In particular, research using machine learning as well as existing detection methods is increasing for ransomware detection and response. However, traditional machine learning has a problem in that it is difficult to extract accurate predictions because the model tends to predict in the direction where there is a lot of data. Accordingly, in an imbalance class consisting of a large number of non-Ransomware (normal code or malware) and a small number of Ransomware, a technique for resolving the imbalance and improving ransomware detection performance is proposed. In this experiment, we use two scenarios (Binary, Multi Classification) to confirm that the sampling technique improves the detection performance of a small number of classes while maintaining the detection performance of a large number of classes. In particular, the proposed mixed sampling technique (SMOTE+ENN) resulted in a performance(G-mean, F1-score) improvement of more than 10%.

최근 아일랜드 보건당국, 미(美) 송유관 등 전(全) 세계적으로 랜섬웨어 피해가 급증하고 있으며, 사회 모든 분야에 피해를 입히고 있다. 특히, 랜섬웨어 탐지 및 대응에 기존의 탐지방법뿐 아니라 머신러닝 등을 이용한 연구가 늘어 나고 있다. 하지만, 전통적인 머신러닝은 모델이 데이터가 많은 쪽으로 예측하는 경향이 강해 정확한 예측값을 추출하기 어려운 문제점이 있다. 이에 다수(Majority)의 Non-Ransomware(정상코드 또는 멀웨어)와 소수의(Minority) Ransomware로 구성된 불균형(Imbalance) 클래스에서 샘플링 기법을 통해 불균형을 해소하고 랜섬웨어탐지 성능을 향상시키는 기법을 제안하였다. 본 실험에서는 두가지 시나리오(Binary, Multi Classification)을 사용하여 샘플링 기법이 다수 클래스의 탐지 성능을 유지하면서 소수 클래스의 탐지 성능을 개선함을 확인하였다. 특히, 제안된 혼합샘플링 기법(SMOTE+ENN)이 10% 이상의 성능(G-mean, F1-score) 향상을 도출했다.

Keywords

References

  1. Cyber Crime Magazine, 2023,3, https://cybersecurityventures.com/global-ransomware-damage-costs-predicted-to-reach-250-billion-usd-by-2031 
  2. A. Alzahrani, H. Alshahrani, A. Alshehri, and H. Fu, "An intelligent behavior-based ransomware detection system for Android platform," in Proc. 1st IEEE Int. Conf. Trust, Privacy Secur. Intell. Syst. Appl. (TPS ISA), Dec. 2019 
  3. H. Faris, M. Habib, I. Almomani, M. Eshtay, and I. Aljarah, ''Optimizing extreme learning machines using chains of salps for efficient Android ransomware detection,'' Appl. Sci., vol. 10, no. 11, p. 3706, May 2020. 
  4. M. Scalas, D. Maiorca, F. Mercaldo, C. A. Visaggio, F. Martinelli, and G. Giacinto, ''On the effectiveness of system API-related infor mation for Android ransomware detection,'' Comput. Secur., vol. 86, pp. 168-182, Sep. 2019.  https://doi.org/10.1016/j.cose.2019.06.004
  5. Y. FREUND, Experiment with a new boosting algorithm. Proc. of the 13th International Conference on Machine Learning, 1996: 148-156. 
  6. N. V. CHAWLA, A. LAZAREVIC, L. O. HALL, et al. SMOTE-Boost: improving prediction of the minority class in boosting. Proc. of the 7th European Conference on Principles and Practice of Knowledge Discovery in Data bases, 2003: 107-119. 
  7. M. J. Son, S. W. Jung, E. J. Hwang, 불균형 데이터 분류를 위한 딥러닝 기반 오버샘플링 기법, 정보처리학회논문지:소프트웨어 및 데이터 공학 2019, 8, 311-316, doi:10.3745/KTSDE.2019.8.7.311. 
  8. I. Jung, J. Ji, C. Cho, EmSM: Ensemble Mixed Sampling Method for Classifying Imbalanced Intrusion Detection Data. Electronics 2022, 11, 1346. https://doi.org/10.3390/electronics11091346. 
  9. D. Kim, S.Kang, J. Song, 불균형 자료에 대한분류분석, 응용통계연구 2015, 28, 495 -509, doi:10.5351/KJAS.2015.28.3.495. 
  10. KISA(한국인터넷진흥원), 워너크라이 분석 스페셜리포트, 2017.10.13. 
  11. H. HAN, W. Y. WANG, B. H. MAO, Border line-SMOTE: a new over-sampling method in imbalanced data sets learning. Proc. of the International Conference on Advances in Intelligent Computing, 2005: 878-887. 
  12. H. Haibo, A. Garcia, "Learning from Imbalanced Data", IEEE Transactions On Knowledge And Data Engineering, Vol.2, No.9, September (2009). 
  13. N. Lachtar, D. Ibdah, and A. Bacha, ''The case for native instructions in the detection of mobile ransomware,'' IEEE Lett. Comput. Soc., vol. 2, no. 2, pp. 16-19, Jun. 2019.  https://doi.org/10.1109/LOCS.2019.2918091
  14. A. K. Singh, G. Wadhwa, M. Ahuja, K. Soni, and K. Sharma, ''Android malware detection using LSI-based reduced opcode feature vector,'' Proꠓcedia Comput. Sci., vol. 173, pp. 291-298, 2020.  https://doi.org/10.1016/j.procs.2020.06.034
  15. M. Kubat and S. Matwin, "Addressing the curse of imbalanced training sets: one-sided selection," in Proceedings of the International Conference on Machine Learning, pp. 179-186, Nashville, Tenn, USA, 1997.View at: Google Scholar 
  16. Y. Liu, X. H. Yu, J. X. Huang, and A. J. An, "Combining integrated sampling with SVM ensembles for learning from imbalanced datasets," Information Processing & Management, vol. 47, no. 4, pp. 617-631, 2011. View at: Publisher Site | Google Scholar  https://doi.org/10.1016/j.ipm.2010.11.007
  17. B. Yan, G. Han, M. Sun, and S. Ye, A Novel Region Adaptive SMOTE Algorithm for Intrusion Detection on Imbalanced Problem. In Proceedings of the 2017 3rd IEEE International Conference on Computer and Communications (ICCC); IEEE: Chengdu, December 2017; pp. 1281-1286. 
  18. Yong Sun, Feng Liu, SMOTE-NCL: A Re-Sampling Method with Filter for Network Intrusion Detection. In Proceedings of the 2016 2nd IEEE International Conference on Computer and Communications (ICCC); IEEE: Chengdu, China, October 2016; pp. 1157-1161. 
  19. H.J. Lee, S. Lee, 데이터 전처리와 앙상블 기법을 통한 불균형 데이터의 분류모형 비교 연구, 응용통계연구 2014, 27, 357-371, doi:10.5351/KJAS.2014.27.3.357.