DOI QR코드

DOI QR Code

IoT Malware Detection and Family Classification Using Entropy Time Series Data Extraction and Recurrent Neural Networks

엔트로피 시계열 데이터 추출과 순환 신경망을 이용한 IoT 악성코드 탐지와 패밀리 분류

  • 김영호 (단국대학교 컴퓨터학과) ;
  • 이현종 ((주)케이사인 보안기술연구소) ;
  • 황두성 (단국대학교 소프트웨어학과)
  • Received : 2021.09.07
  • Accepted : 2021.11.17
  • Published : 2022.05.31

Abstract

IoT (Internet of Things) devices are being attacked by malware due to many security vulnerabilities, such as the use of weak IDs/passwords and unauthenticated firmware updates. However, due to the diversity of CPU architectures, it is difficult to set up a malware analysis environment and design features. In this paper, we design time series features using the byte sequence of executable files to represent independent features of CPU architectures, and analyze them using recurrent neural networks. The proposed feature is a fixed-length time series pattern extracted from the byte sequence by calculating partial entropy and applying linear interpolation. Temporary changes in the extracted feature are analyzed by RNN and LSTM. In the experiment, the IoT malware detection showed high performance, while low performance was analyzed in the malware family classification. When the entropy patterns for each malware family were compared visually, the Tsunami and Gafgyt families showed similar patterns, resulting in low performance. LSTM is more suitable than RNN for learning temporal changes in the proposed malware features.

IoT (Internet of Things) 장치는 취약한 아이디/비밀번호 사용, 인증되지 않은 펌웨어 업데이트 등 많은 보안 취약점을 보여 악성코드의 공격 대상이 되고 있다. 그러나 CPU 구조의 다양성으로 인해 악성코드 분석 환경 설정과 특징 설계에 어려움이 있다. 본 논문에서는 CPU 구조와 독립된 악성코드의 특징 표현을 위해 실행 파일의 바이트 순서를 이용한 시계열 특징을 설계하고 순환 신경망을 통해 분석한다. 제안하는 특징은 바이트 순서의 부분 엔트로피 계산과 선형 보간을 통한 고정 길이의 시계열 패턴이다. 추출된 특징의 시계열 변화는 RNN과 LSTM으로 학습시켜 분석한다. 실험에서 IoT 악성코드 탐지는 높은 성능을 보였지만, 패밀리 분류는 비교적 성능이 낮았다. 악성코드 패밀리별 엔트로피 패턴을 시각화하여 비교했을 때 Tsunami와 Gafgyt 패밀리가 유사한 패턴을 나타내 분류 성능이 낮아진 것으로 분석되었다. 제안된 악성코드 특징의 데이터 간 시계열 변화 학습에 RNN보다 LSTM이 더 적합하다.

Keywords

Acknowledgement

이 논문은 2021년도 정부(과학기술정보통신부)의 재원으로 정보통신기획평가원의 지원을 받아 수행된 연구임(No.2019-0-00197, 스마트 퍼실러티 환경보호를 위한 신뢰기반 사이버 보안 플랫폼).

References

  1. A. S. Gillis, "What is IoT (Internet of Things) and how does it work," IoT Agenda, TechTarget, 11, 2020.
  2. Statista, "Number of Internet of Things (IoT) connected devices worldwide from 2019 to 2030," 2020.
  3. M. B. Barcena and C. Wueest, "Insecurity in the internet of things," Security Response, Symantec, 2015.
  4. N. Woolf, "DDoS attack that disrupted internet was largest of its kind in history, experts say," The Guardian, 26, 2016.
  5. J. Gamblin, Mirai Source Code [Internet], https://github.com/jgamblin/Mirai-Source-Code.
  6. AhnLab, "Mirai variant malware analysis report," ASEC Report, Vol.100, 2020.
  7. Y. Ye, T. Li, D. Adjeroh, and S. S. Iyengar, "A survey on malware detection using data mining techniques," ACM Computing Surveys (CSUR), Vol.50, No.3, pp.1-40, 2017.
  8. E. Cozzi, M. Graziano, Y. Fratantonio, and D. Balzarotti, "Understanding linux malware," In 2018 IEEE Symposium on Security and Privacy (SP), IEEE, 2018.
  9. R. Sihwail, K. Omar, and K. A. Z. Ariffin, "A survey on malware analysis techniques: Static, dynamic, hybrid and memory analysis," International Journal on Advanced Science, Engineering and Information Technology, Vol.8, No.4-2, pp.1662-1671, 2018. https://doi.org/10.18517/ijaseit.8.4-2.6827
  10. T. L. Wan et al., "Efficient detection and classification of internet-of-things malware based on byte sequences from executable files," IEEE Open Journal of the Computer Society, Vol.1, pp.262-275, 2020. https://doi.org/10.1109/OJCS.2020.3033974
  11. H. Darabian, A. Dehghantanha, S. Hashemi, S. Homayoun, and K. K. R. Choo, "An opcode-based technique for polymorphic Internet of Things malware detection," Concurrency and Computation Practice and Experience, Vol.32, No.6, pp.e5173, 2019.
  12. I. Miliaraki, K. Berberich, R. Genmulla, and S. Zoupanos, "Mind the gap: Large-scale frequent sequence mining," In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, ACM, 2013.
  13. J. Jeon, J. H. Park, and Y. Jeong, "Dynamic analysis for IoT malware detection with convolution neural network model," IEEE Access, Vol.8, pp.96899-96911, 2020. https://doi.org/10.1109/access.2020.2995887
  14. M. D. Zeiler and R. Fergus, "Visualizing and understanding convolution networks," European Conference on Computer Vision, Springer, Cham, 2014.
  15. R. Lyda and J. Hamrock, "Using entropy analysis to find encrpyted and packed malware," IEEE Security & Privacy, Vol.5, No.2, pp.40-45, 2007.
  16. I. Sorokin, "Comparing files using structural entropy," Journal in Computer Virology, Vol.7, No.4, pp.259-265, 2011. https://doi.org/10.1007/s11416-011-0153-9
  17. S. Goki, "Deep learning from scratch 2," Oreilly, 2019.
  18. Malwares.com [Internet], https://www.malwares.com/.
  19. Kaspersky [Internet], https://www.kaspersky.co.kr/.