Malware detection methodology through on pre-training and transfer learning for AutoEncoder based deobfuscation

Jang, Jae-Seok;Ku, Bon-Jae;Eom, Sung-Jun;Han, Ji-Hyeong;

doi:10.3745/PKIPS.y2022m11a.905

Annual Conference of KIPS (한국정보처리학회:학술대회논문집)

2022.11a
/
Pages.905-907
/
2022
/
2005-0011(pISSN)
/
2671-7298(eISSN)

Korea Information Processing Society (한국정보처리학회)

DOI QR Code

Malware detection methodology through on pre-training and transfer learning for AutoEncoder based deobfuscation

AutoEncoder 기반 역난독화 사전학습 및 전이학습을 통한 악성코드 탐지 방법론

Jang, Jae-Seok (Dept. of Computer Science and Engineering, Seoul National University of Science and Technology) ;
Ku, Bon-Jae (Dept. of Computer Science and Engineering, Seoul National University of Science and Technology) ;
Eom, Sung-Jun (School of Electrical and Computer Engineering, University of Seoul) ;
Han, Ji-Hyeong (Dept. of Computer Science and Engineering, Seoul National University of Science and Technology)

장재석 (서울과학기술대학교 컴퓨터공학과) ;
구본재 (서울과학기술대학교 컴퓨터공학과) ;
엄성준 (서울시립대학교 전자전기컴퓨터공학부) ;
한지형 (서울과학기술대학교 컴퓨터공학과)

Published : 2022.11.21

https://doi.org/10.3745/PKIPS.y2022m11a.905 Citation PDF

Download PDF

⟨ Previous Next ⟩

Abstract

악성코드를 분석하는 기존 기법인 정적분석은 빠르고 효율적으로 악성코드를 탐지할 수 있지만 난독화된 파일에 취약한 반면,, 동적분석은 난독화된 파일에 적합하지만 느리고 비용이 많이 든다는 단점을 가진다. 본 연구에서는 두 분석 기법의 단점을 해결하기 위해 딥러닝 모델을 활용한 난독화에 강한 정적분석 모델을 제안하였다. 본 연구에서 제안한 방법은 원본 코드 및 난독화된 파일을 grayscale 이미지로 변환하여 데이터셋을 구축하고 AutoEncoder 를 사전학습시켜 encoder 가 원본 파일과 난독화된 파일로부터 원본 파일의 특징을 추출할 수 있도록 한 이후, encoder 의 output 을 fully connected layer 의 입력으로 넣고 전이학습시켜 악성코드를 탐지하도록 하였다. 본 연구에서는 제안한 방법론은 난독화된 파일에서 악성코드를 탐지하는 성능을 F1 score 기준 14.17% 포인트 향상시켰고, 난독화된 파일과 원본 파일을 전체를 합친 데이터셋에서도 악성코드 탐지 성능을 F1 score 기준 7.22% 포인트 향상시켰다.

Keywords

Acknowledgement

본 프로젝트는 과학기술정보통신부 정보통신창의인재양성사업의 지원을 통해 수행한 ICT 멘토링 프로젝트 결과물입니다