• Title/Summary/Keyword: TLSH

Search Result 1, Processing Time 0.027 seconds

Method of Similarity Hash-Based Malware Family Classification (유사성 해시 기반 악성코드 유형 분류 기법)

  • Kim, Yun-jeong;Kim, Moon-sun;Lee, Man-hee
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.32 no.5
    • /
    • pp.945-954
    • /
    • 2022
  • Billions of malicious codes are detected every year, of which only 0.01% are new types of malware. In this situation, an effective malware type classification tool is needed, but previous studies have limitations in quickly analyzing a large amount of malicious code because it requires a complex and massive amount of data pre-processing. To solve this problem, this paper proposes a method to classify the types of malicious code based on the similarity hash without complex data preprocessing. This approach trains the XGBoost model based on the similarity hash information of the malware. To evaluate this approach, we used the BIG-15 dataset, which is widely used in the field of malware classification. As a result, the malicious code was classified with an accuracy of 98.9% also, identified 3,432 benign files with 100% accuracy. This result is superior to most recent studies using complex preprocessing and deep learning models. Therefore, it is expected that more efficient malware classification is possible using the proposed approach.