• Title/Summary/Keyword: malware family classification

Search Result 14, Processing Time 0.03 seconds

A Cross-Platform Malware Variant Classification based on Image Representation

  • Naeem, Hamad;Guo, Bing;Ullah, Farhan;Naeem, Muhammad Rashid
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.13 no.7
    • /
    • pp.3756-3777
    • /
    • 2019
  • Recent internet development is helping malware researchers to generate malicious code variants through automated tools. Due to this reason, the number of malicious variants is increasing day by day. Consequently, the performance improvement in malware analysis is the critical requirement to stop the rapid expansion of malware. The existing research proved that the similarities among malware variants could be used for detection and family classification. In this paper, a Cross-Platform Malware Variant Classification System (CP-MVCS) proposed that converted malware binary into a grayscale image. Further, malicious features extracted from the grayscale image through Combined SIFT-GIST Malware (CSGM) description. Later, these features used to identify the relevant family of malware variant. CP-MVCS reduced computational time and improved classification accuracy by using CSGM feature description along machine learning classification. The experiment performed on four publically available datasets of Windows OS and Android OS. The experimental results showed that the computation time and malware classification accuracy of CP-MVCS was higher than traditional methods. The evaluation also showed that CP-MVCS was not only differentiated families of malware variants but also identified both malware and benign samples in mix fashion efficiently.

API Feature Based Ensemble Model for Malware Family Classification (악성코드 패밀리 분류를 위한 API 특징 기반 앙상블 모델 학습)

  • Lee, Hyunjong;Euh, Seongyul;Hwang, Doosung
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.29 no.3
    • /
    • pp.531-539
    • /
    • 2019
  • This paper proposes the training features for malware family analysis and analyzes the multi-classification performance of ensemble models. We construct training data by extracting API and DLL information from malware executables and use Random Forest and XGBoost algorithms which are based on decision tree. API, API-DLL, and DLL-CM features for malware detection and family classification are proposed by analyzing frequently used API and DLL information from malware and converting high-dimensional features to low-dimensional features. The proposed feature selection method provides the advantages of data dimension reduction and fast learning. In performance comparison, the malware detection rate is 93.0% for Random Forest, the accuracy of malware family dataset is 92.0% for XGBoost, and the false positive rate of malware family dataset including benign is about 3.5% for Random Forest and XGBoost.

IoT Malware Detection and Family Classification Using Entropy Time Series Data Extraction and Recurrent Neural Networks (엔트로피 시계열 데이터 추출과 순환 신경망을 이용한 IoT 악성코드 탐지와 패밀리 분류)

  • Kim, Youngho;Lee, Hyunjong;Hwang, Doosung
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.11 no.5
    • /
    • pp.197-202
    • /
    • 2022
  • IoT (Internet of Things) devices are being attacked by malware due to many security vulnerabilities, such as the use of weak IDs/passwords and unauthenticated firmware updates. However, due to the diversity of CPU architectures, it is difficult to set up a malware analysis environment and design features. In this paper, we design time series features using the byte sequence of executable files to represent independent features of CPU architectures, and analyze them using recurrent neural networks. The proposed feature is a fixed-length time series pattern extracted from the byte sequence by calculating partial entropy and applying linear interpolation. Temporary changes in the extracted feature are analyzed by RNN and LSTM. In the experiment, the IoT malware detection showed high performance, while low performance was analyzed in the malware family classification. When the entropy patterns for each malware family were compared visually, the Tsunami and Gafgyt families showed similar patterns, resulting in low performance. LSTM is more suitable than RNN for learning temporal changes in the proposed malware features.

Malware Family Recommendation using Multiple Sequence Alignment (다중 서열 정렬 기법을 이용한 악성코드 패밀리 추천)

  • Cho, In Kyeom;Im, Eul Gyu
    • Journal of KIISE
    • /
    • v.43 no.3
    • /
    • pp.289-295
    • /
    • 2016
  • Malware authors spread malware variants in order to evade detection. It's hard to detect malware variants using static analysis. Therefore dynamic analysis based on API call information is necessary. In this paper, we proposed a malware family recommendation method to assist malware analysts in classifying malware variants. Our proposed method extract API call information of malware families by dynamic analysis. Then the multiple sequence alignment technique was applied to the extracted API call information. A signature of each family was extracted from the alignment results. By the similarity of the extracted signatures, our proposed method recommends three family candidates for unknown malware. We also measured the accuracy of our proposed method in an experiment using real malware samples.

Malware Classification Method using Malware Visualization and Transfer Learning (악성코드 이미지화와 전이학습을 이용한 악성코드 분류 기법)

  • Lee, Jong-Kwan;Lee, Minwoo
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2021.05a
    • /
    • pp.555-556
    • /
    • 2021
  • In this paper, we propose a malware family classification scheme using malware visualization and transfer learning. The malware can be easily reused or modified. However, traditional malware detection techniques are vulnerable to detecting variants of malware. Malware belonging to the same class are converted into images that are similar to each other. Therefore, the proposed method can classify malware with a deep learning model that has been verified in the field of image classification. As a result of an experiment using the VGG-16 model on the Malimg dataset, the classification accuracy was over 98%.

  • PDF

Malware Family Detection and Classification Method Using API Call Frequency (API 호출 빈도를 이용한 악성코드 패밀리 탐지 및 분류 방법)

  • Joe, Woo-Jin;Kim, Hyong-Shik
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.31 no.4
    • /
    • pp.605-616
    • /
    • 2021
  • While malwares must be accurately identifiable from arbitrary programs, existing studies using classification techniques have limitations that they can only be applied to limited samples. In this work, we propose a method to utilize API call frequency to detect and classify malware families from arbitrary programs. Our proposed method defines a rule that checks whether the call frequency of a particular API exceeds the threshold, and identifies a specific family by utilizing the rate information on the corresponding rules. In this paper, decision tree algorithm is applied to define the optimal threshold that can accurately identify a particular family from the training set. The performance measurements using 4,443 samples showed 85.1% precision and 91.3% recall rate for family detection, 97.7% precision and 98.1% reproduction rate for classification, which confirms that our method works to distinguish malware families effectively.

Multi-Modal Based Malware Similarity Estimation Method (멀티모달 기반 악성코드 유사도 계산 기법)

  • Yoo, Jeong Do;Kim, Taekyu;Kim, In-sung;Kim, Huy Kang
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.29 no.2
    • /
    • pp.347-363
    • /
    • 2019
  • Malware has its own unique behavior characteristics, like DNA for living things. To respond APT (Advanced Persistent Threat) attacks in advance, it needs to extract behavioral characteristics from malware. To this end, it needs to do classification for each malware based on its behavioral similarity. In this paper, various similarity of Windows malware is estimated; and based on these similarity values, malware's family is predicted. The similarity measures used in this paper are as follows: 'TF-IDF cosine similarity', 'Nilsimsa similarity', 'malware function cosine similarity' and 'Jaccard similarity'. As a result, we find the prediction rate for each similarity measure is widely different. Although, there is no similarity measure which can be applied to malware classification with high accuracy, this result can be helpful to select a similarity measure to classify specific malware family.

Method of Similarity Hash-Based Malware Family Classification (유사성 해시 기반 악성코드 유형 분류 기법)

  • Kim, Yun-jeong;Kim, Moon-sun;Lee, Man-hee
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.32 no.5
    • /
    • pp.945-954
    • /
    • 2022
  • Billions of malicious codes are detected every year, of which only 0.01% are new types of malware. In this situation, an effective malware type classification tool is needed, but previous studies have limitations in quickly analyzing a large amount of malicious code because it requires a complex and massive amount of data pre-processing. To solve this problem, this paper proposes a method to classify the types of malicious code based on the similarity hash without complex data preprocessing. This approach trains the XGBoost model based on the similarity hash information of the malware. To evaluate this approach, we used the BIG-15 dataset, which is widely used in the field of malware classification. As a result, the malicious code was classified with an accuracy of 98.9% also, identified 3,432 benign files with 100% accuracy. This result is superior to most recent studies using complex preprocessing and deep learning models. Therefore, it is expected that more efficient malware classification is possible using the proposed approach.

Andro-profiler: Anti-malware system based on behavior profiling of mobile malware (행위기반의 프로파일링 기법을 활용한 모바일 악성코드 분류 기법)

  • Yun, Jae-Sung;Jang, Jae-Wook;Kim, Huy Kang
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.24 no.1
    • /
    • pp.145-154
    • /
    • 2014
  • In this paper, we propose a novel anti-malware system based on behavior profiling, called Andro-profiler. Andro-profiler consists of mobile devices and a remote server, and is implemented in Droidbox. Our aim is to detect and classify malware using an automatic classifier based on behavior profiling. First, we propose the representative behavior profiling for each malware family represented by system calls coupled with Droidbox system logs. This is done by executing the malicious application on an emulator and extracting integrated system logs. By comparing the behavior profiling of malicious applications with representative behavior profiling for each malware family, we can detect and classify them into malware families. Andro-profiler shows over 99% of classification accuracy in classifying malware families.

De-cloaking Malicious Activities in Smartphones Using HTTP Flow Mining

  • Su, Xin;Liu, Xuchong;Lin, Jiuchuang;He, Shiming;Fu, Zhangjie;Li, Wenjia
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.11 no.6
    • /
    • pp.3230-3253
    • /
    • 2017
  • Android malware steals users' private information, and embedded unsafe advertisement (ad) libraries, which execute unsafe code causing damage to users. The majority of such traffic is HTTP and is mixed with other normal traffic, which makes the detection of malware and unsafe ad libraries a challenging problem. To address this problem, this work describes a novel HTTP traffic flow mining approach to detect and categorize Android malware and unsafe ad library. This work designed AndroCollector, which can automatically execute the Android application (app) and collect the network traffic traces. From these traces, this work extracts HTTP traffic features along three important dimensions: quantitative, timing, and semantic and use these features for characterizing malware and unsafe ad libraries. Based on these HTTP traffic features, this work describes a supervised classification scheme for detecting malware and unsafe ad libraries. In addition, to help network operators, this work describes a fine-grained categorization method by generating fingerprints from HTTP request methods for each malware family and unsafe ad libraries. This work evaluated the scheme using HTTP traffic traces collected from 10778 Android apps. The experimental results show that the scheme can detect malware with 97% accuracy and unsafe ad libraries with 95% accuracy when tested on the popular third-party Android markets.