• Title/Summary/Keyword: malware classification

Search Result 102, Processing Time 0.024 seconds

Generate Optimal Number of Features in Mobile Malware Classification using Venn Diagram Intersection

  • Ismail, Najiahtul Syafiqah;Yusof, Robiah Binti;MA, Faiza
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.7
    • /
    • pp.389-396
    • /
    • 2022
  • Smartphones are growing more susceptible as technology develops because they contain sensitive data that offers a severe security risk if it falls into the wrong hands. The Android OS includes permissions as a crucial component for safeguarding user privacy and confidentiality. On the other hand, mobile malware continues to struggle with permission misuse. Although permission-based detection is frequently utilized, the significant false alarm rates brought on by the permission-based issue are thought to make it inadequate. The present detection method has a high incidence of false alarms, which reduces its ability to identify permission-based attacks. By using permission features with intent, this research attempted to improve permission-based detection. However, it creates an excessive number of features and increases the likelihood of false alarms. In order to generate the optimal number of features created and boost the quality of features chosen, this research developed an intersection feature approach. Performance was assessed using metrics including accuracy, TPR, TNR, and FPR. The most important characteristics were chosen using the Correlation Feature Selection, and the malicious program was categorized using SVM and naive Bayes. The Intersection Feature Technique, according to the findings, reduces characteristics from 486 to 17, has a 97 percent accuracy rate, and produces 0.1 percent false alarms.

Study of Static Analysis and Ensemble-Based Linux Malware Classification (정적 분석과 앙상블 기반의 리눅스 악성코드 분류 연구)

  • Hwang, Jun-ho;Lee, Tae-jin
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.29 no.6
    • /
    • pp.1327-1337
    • /
    • 2019
  • With the growth of the IoT market, malware security threats are steadily increasing for devices that use the linux architecture. However, except for the major malware causing serious security damage such as Mirai, there is no related technology or research of security community about linux malware. In addition, the diversity of devices, vendors, and architectures in the IoT environment is further intensifying, and the difficulty in handling linux malware is also increasing. Therefore, in this paper, we propose an analysis system based on ELF which is the main format of linux architecture, and a binary based analysis system considering IoT environment. The ELF-based analysis system can be pre-classified for a large number of malicious codes at a relatively high speed and a relatively low-speed binary-based analysis system can classify all the data that are not preprocessed. These two processes are supposed to complement each other and effectively classify linux-based malware.

Performance Enhancement of Android Malware Classification using PCA (주성분 분석을 활용한 안드로이드 악성코드 분류 성능 향상 방안)

  • Jeon, Dong-Ha;Lee, Soo-Jin
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2022.07a
    • /
    • pp.249-250
    • /
    • 2022
  • 최근 API Call을 기반으로 하는 악성코드 탐지 및 분류에 대한 연구가 활발히 진행되고 있다. 그러나 API Call 기반의 데이터는 방대한 양과 다양한 차원의 특성으로 인해 분석과 학습 모델 구축 측면에서 비효율적인 한계가 있다. 이에 본 연구에서는 방대한 API Call 정보를 포함하고 있는 CICAndMal2020 데이터 세트를 대상으로 기존의 특성 선택 기법이 아닌 주성분 분석(Principal Component Analysis)을 사용하여 차원을 대폭 축소 시킨 후 머신러닝 기법을 적용하여 분류를 시도하였다. 실험 결과 전체 9,503개의 특성을 25개의 주성분(전체 대비 약 0.26% 수준)으로 축소시키고 다중 분류 기준 약 84%의 정확도를 나타냈다. 결과적으로 기존 연구에서의 탐지 모델 대비 정확도, F1-score 등의 성능 향상은 물론 차원 축소 측면에서 매우 향상된 결과를 달성하였다.

  • PDF

API Grouping Based Flow Analysis and Frequency Analysis Technique for Android Malware Classification (안드로이드 악성코드 분류를 위한 Flow Analysis 기반의 API 그룹화 및 빈도 분석 기법)

  • Shim, Hyunseok;Park, Jungsoo;Doan, Thien-Phuc;Jung, Souhwan
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.29 no.6
    • /
    • pp.1235-1242
    • /
    • 2019
  • While several machine learning technique has been implemented for Android malware categorization, there is still difficulty in analyzing due to overfitting problem and including of un-executable code, etc. In this paper, we introduce our implemented tool to address these problems. Tool is consists of approximately 1,500 lines of Java code, and perform Flow analysis on set of APIs, or on control flow graph. Our tool groups all the API by its relationship and only perform analysis on actually executing code. Using our tool, we grouped 39032 APIs into 4972 groups, and 12123 groups with result of including class names. We collected 7,000 APKs from 7 families and evaluated our feature reduction technique, and we also reduced features again with selecting APIs that have frequency more than 20%. We finally reduced features to 263-numbers of feature for our collected APKs.

Study on Machine Learning Techniques for Malware Classification and Detection

  • Moon, Jaewoong;Kim, Subin;Song, Jaeseung;Kim, Kyungshin
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.15 no.12
    • /
    • pp.4308-4325
    • /
    • 2021
  • The importance and necessity of artificial intelligence, particularly machine learning, has recently been emphasized. In fact, artificial intelligence, such as intelligent surveillance cameras and other security systems, is used to solve various problems or provide convenience, providing solutions to problems that humans traditionally had to manually deal with one at a time. Among them, information security is one of the domains where the use of artificial intelligence is especially needed because the frequency of occurrence and processing capacity of dangerous codes exceeds the capabilities of humans. Therefore, this study intends to examine the definition of artificial intelligence and machine learning, its execution method, process, learning algorithm, and cases of utilization in various domains, particularly the cases and contents of artificial intelligence technology used in the field of information security. Based on this, this study proposes a method to apply machine learning technology to the method of classifying and detecting malware that has rapidly increased in recent years. The proposed methodology converts software programs containing malicious codes into images and creates training data suitable for machine learning by preparing data and augmenting the dataset. The model trained using the images created in this manner is expected to be effective in classifying and detecting malware.

A Study on Classification of CNN-based Linux Malware using Image Processing Techniques (영상처리기법을 이용한 CNN 기반 리눅스 악성코드 분류 연구)

  • Kim, Se-Jin;Kim, Do-Yeon;Lee, Hoo-Ki;Lee, Tae-Jin
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.21 no.9
    • /
    • pp.634-642
    • /
    • 2020
  • With the proliferation of Internet of Things (IoT) devices, using the Linux operating system in various architectures has increased. Also, security threats against Linux-based IoT devices are increasing, and malware variants based on existing malware are constantly appearing. In this paper, we propose a system where the binary data of a visualized Executable and Linkable Format (ELF) file is applied to Local Binary Pattern (LBP) image processing techniques and a median filter to classify malware in a Convolutional Neural Network (CNN). As a result, the original image showed the highest accuracy and F1-score at 98.77%, and reproducibility also showed the highest score at 98.55%. For the median filter, the highest precision was 99.19%, and the lowest false positive rate was 0.008%. Using the LBP technique confirmed that the overall result was lower than putting the original ELF file through the median filter. When the results of putting the original file through image processing techniques were classified by majority, it was confirmed that the accuracy, precision, F1-score, and false positive rate were better than putting the original file through the median filter. In the future, the proposed system will be used to classify malware families or add other image processing techniques to improve the accuracy of majority vote classification. Or maybe we mean "the use of Linux O/S distributions for various architectures has increased" instead? If not, please rephrase as intended.

PowerShell-based Malware Detection Method Using Command Execution Monitoring and Deep Learning (명령 실행 모니터링과 딥 러닝을 이용한 파워셸 기반 악성코드 탐지 방법)

  • Lee, Seung-Hyeon;Moon, Jong-Sub
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.28 no.5
    • /
    • pp.1197-1207
    • /
    • 2018
  • PowerShell is command line shell and scripting language, built on the .NET framework, and it has several advantages as an attack tool, including built-in support for Windows, easy code concealment and persistence, and various pen-test frameworks. Accordingly, malwares using PowerShell are increasing rapidly, however, there is a limit to cope with the conventional malware detection technique. In this paper, we propose an improved monitoring method to observe commands executed in the PowerShell and a deep learning based malware classification model that extract features from commands using Convolutional Neural Network(CNN) and send them to Recurrent Neural Network(RNN) according to the order of execution. As a result of testing the proposed model with 5-fold cross validation using 1,916 PowerShell-based malwares collected at malware sharing site and 38,148 benign scripts disclosed by an obfuscation detection study, it shows that the model effectively detects malwares with about 97% True Positive Rate(TPR) and 1% False Positive Rate(FPR).

A Study on Performance of ML Algorithms and Feature Extraction to detect Malware (멀웨어 검출을 위한 기계학습 알고리즘과 특징 추출에 대한 성능연구)

  • Ahn, Tae-Hyun;Park, Jae-Gyun;Kwon, Young-Man
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.18 no.1
    • /
    • pp.211-216
    • /
    • 2018
  • In this paper, we studied the way that classify whether unknown PE file is malware or not. In the classification problem of malware detection domain, feature extraction and classifier are important. For that purpose, we studied what the feature is good for classifier and the which classifier is good for the selected feature. So, we try to find the good combination of feature and classifier for detecting malware. For it, we did experiments at two step. In step one, we compared the accuracy of features using Opcode only, Win. API only, the one with both. We founded that the feature, Opcode and Win. API, is better than others. In step two, we compared AUC value of classifiers, Bernoulli Naïve Bayes, K-nearest neighbor, Support Vector Machine and Decision Tree. We founded that Decision Tree is better than others.

Malware Analysis Based on Section, DLL (Section, DLL feature 기반 악성코드 분석 기술 연구)

  • Hwang, Jun-ho;Hwang, Seon-bin;Kim, Ho-gyeong;Ha, Ji-hee;Lee, Tae-jin
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.27 no.5
    • /
    • pp.1077-1086
    • /
    • 2017
  • Malware mutants based on existing malware is widely used because it can easily avoid the existing security system even with a slight pattern change. These malware appear on average more than 1.6 million times a day, and they are gradually expanding to IoT / ICS as well as cyber space, which has a large scale of damage. In this paper, we propose an analytical method based on features of PE Section and DLL that do not give much significance, rather than pattern-based analysis, Sandbox-based analysis, and CFG, Strings-based analysis. It is expected that the proposed model will be able to cope with effective malicious code in case of combined operation of various existing analysis technologies.

Distributed Processing System Design and Implementation for Feature Extraction from Large-Scale Malicious Code (대용량 악성코드의 특징 추출 가속화를 위한 분산 처리 시스템 설계 및 구현)

  • Lee, Hyunjong;Euh, Seongyul;Hwang, Doosung
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.8 no.2
    • /
    • pp.35-40
    • /
    • 2019
  • Traditional Malware Detection is susceptible for detecting malware which is modified by polymorphism or obfuscation technology. By learning patterns that are embedded in malware code, machine learning algorithms can detect similar behaviors and replace the current detection methods. Data must collected continuously in order to learn malicious code patterns that change over time. However, the process of storing and processing a large amount of malware files is accompanied by high space and time complexity. In this paper, an HDFS-based distributed processing system is designed to reduce space complexity and accelerate feature extraction time. Using a distributed processing system, we extract two API features based on filtering basis, 2-gram feature and APICFG feature and the generalization performance of ensemble learning models is compared. In experiments, the time complexity of the feature extraction was improved about 3.75 times faster than the processing time of a single computer, and the space complexity was about 5 times more efficient. The 2-gram feature was the best when comparing the classification performance by feature, but the learning time was long due to high dimensionality.