DOI QR코드

DOI QR Code

Comparative Analysis of Dimensionality Reduction Techniques for Advanced Ransomware Detection with Machine Learning

기계학습 기반 랜섬웨어 공격 탐지를 위한 효과적인 특성 추출기법 비교분석

  • 김한석 (국방대학교 국방과학학과) ;
  • 이수진 (국방대학교 국방과학학과)
  • Received : 2023.02.28
  • Accepted : 2023.03.30
  • Published : 2023.03.31

Abstract

To detect advanced ransomware attacks with machine learning-based models, the classification model must train learning data with high-dimensional feature space. And in this case, a 'curse of dimension' phenomenon is likely to occur. Therefore, dimensionality reduction of features must be preceded in order to increase the accuracy of the learning model and improve the execution speed while avoiding the 'curse of dimension' phenomenon. In this paper, we conducted classification of ransomware by applying three machine learning models and two feature extraction techniques to two datasets with extremely different dimensions of feature space. As a result of the experiment, the feature dimensionality reduction techniques did not significantly affect the performance improvement in binary classification, and it was the same even when the dimension of featurespace was small in multi-class clasification. However, when the dataset had high-dimensional feature space, LDA(Linear Discriminant Analysis) showed quite excellent performance.

점점 더 고도화되고 있는 랜섬웨어 공격을 기계학습 기반 모델로 탐지하기 위해서는, 분류 모델이 고차원의 특성을 가지는 학습데이터를 훈련해야 한다. 그리고 이 경우 '차원의 저주' 현상이 발생하기 쉽다. 따라서 차원의 저주 현상을 회피하면서 학습모델의 정확성을 높이고 실행 속도를 향상하기 위해 특성의 차원 축소가 반드시 선행되어야 한다. 본 논문에서는 특성의 차원이 극단적으로 다른 2종의 데이터세트를 대상으로 3종의 기계학습 모델과 2종의 특성 추출기법을 적용하여 랜섬웨어 분류를 수행하였다. 실험 결과, 이진 분류에서는 특성 차원 축소기법이 성능 향상에 큰 영향을 미치지 않았으며, 다중 분류에서도 데이터세트의 특성 차원이 작을 경우에는 동일하였다. 그러나 학습데이터가 고차원의 특성을 가지는 상황에서 다중 분류를 시도했을 경우 LDA(Linear Discriminant Analysis)가 우수한 성능을 나타냈다.

Keywords

References

  1. Emisoft, "The State of Ransomware in the US: Report and Statistics 2022 (2023.1.2)", Retrieved Feb. 11, from: https://www.emsisoft.com/en/blog/43258/the-state-of-ransomware-in-the-us-reportand-statistics-2022/.
  2. Korea Anti Ransomware Alliance, "KARA 랜섬웨어 동향 보고서(2022.9.20), Retrieved Feb. 11, from: https://www.skshieldus.com/kor/support/download/report.do.
  3. Donoho, D. L., "High-dimensional data analysis: The curses and blessings of dimensionality.", AMS conference on Math Challenges of the 21st Century, pp.1-32, 2000.
  4. Hotelling, H., "Analysis of a complex of statistical variables into principal components.", Journal of Educational Psychology, 24(6), pp.417-441, 1933.
  5. Fisher, R. A., "The use of multiple measurements in taxonomic problems.", Annals of Eugenics, 7(2), pp.179-188.
  6. Almashhadani, Ahmad O., et al. "A multi-classifier network-based crypto ransomware detection system: A case study of locky ransomware." IEEE access 7, pp.47053-47067, 2019. https://doi.org/10.1109/ACCESS.2019.2907485
  7. Sgandurra, D., Munoz-Gonzalez, L., Mohsen, R., and Lupu, E. C., "Automated Dynamic Analysis of Ransomware: Benefits, Limitations and use for Detection.", arXiv preprint arXiv:1609.03020, 2016.
  8. Gyu Bin Lee, Jeong Yun Oak, Eul Gyu Im , "Method of Signature Extraction and Selection for Ransomware Dynamic Analysis.", KIISE Transactions on Computing Practices, 24(2), pp.99 - 104., 2019.
  9. Nguyen Duc Thang, Soojin Lee. "LightGBM-based Ransomware Detection using API Call Sequences." International Journal of Advanced Computer Science and Applications 12.10, 2021.
  10. Ji-Gu Lee, Soo-Jin Lee, "IoT Attack Detection Using PCA and Machine Learning.", Proceedings of the Korean Society of Computer Information Conference 30(2), pp. 245-246, 2022
  11. Sahin, D. O., Kural, O. E., Akleylek, S., & Kilic, E. "Permission-based Android malware analysis by using dimension reduction with PCA and LDA." Journal of Information Security and Applications, 63, 102995, 2021.
  12. Datti, R., & Lakhina, S., "Performance comparison of features reduction" techniques for intrusion detection system. vol, 3, 4., 2012.