• Title/Summary/Keyword: Obfuscated Feature

Search Result 4, Processing Time 0.019 seconds

Spam Image Detection Model based on Deep Learning for Improving Spam Filter

  • Seong-Guk Nam;Dong-Gun Lee;Yeong-Seok Seo
    • Journal of Information Processing Systems
    • /
    • v.19 no.3
    • /
    • pp.289-301
    • /
    • 2023
  • Due to the development and dissemination of modern technology, anyone can easily communicate using services such as social network service (SNS) through a personal computer (PC) or smartphone. The development of these technologies has caused many beneficial effects. At the same time, bad effects also occurred, one of which was the spam problem. Spam refers to unwanted or rejected information received by unspecified users. The continuous exposure of such information to service users creates inconvenience in the user's use of the service, and if filtering is not performed correctly, the quality of service deteriorates. Recently, spammers are creating more malicious spam by distorting the image of spam text so that optical character recognition (OCR)-based spam filters cannot easily detect it. Fortunately, the level of transformation of image spam circulated on social media is not serious yet. However, in the mail system, spammers (the person who sends spam) showed various modifications to the spam image for neutralizing OCR, and therefore, the same situation can happen with spam images on social media. Spammers have been shown to interfere with OCR reading through geometric transformations such as image distortion, noise addition, and blurring. Various techniques have been studied to filter image spam, but at the same time, methods of interfering with image spam identification using obfuscated images are also continuously developing. In this paper, we propose a deep learning-based spam image detection model to improve the existing OCR-based spam image detection performance and compensate for vulnerabilities. The proposed model extracts text features and image features from the image using four sub-models. First, the OCR-based text model extracts the text-related features, whether the image contains spam words, and the word embedding vector from the input image. Then, the convolution neural network-based image model extracts image obfuscation and image feature vectors from the input image. The extracted feature is determined whether it is a spam image by the final spam image classifier. As a result of evaluating the F1-score of the proposed model, the performance was about 14 points higher than the OCR-based spam image detection performance.

Android App Birthmarking Technique Resilient to Code Obfuscation (난독화에 강인한 안드로이드 앱 버스마킹 기법)

  • Kim, Dongjin;Cho, Seong-Je;Chung, Youngki;Woo, Jinwoon;Ko, Jeonguk;Yang, Soo-Mi
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.40 no.4
    • /
    • pp.700-708
    • /
    • 2015
  • A software birthmark is the set of characteristics of a program which can be used to identify the program. Many researchers have studied on detecting theft of java programs using some birthmarks. In case of Android apps, code obfuscation techniques are used to protect the apps against reverse-engineering and tampering. However, attackers can also use the obfuscation techniques in order to conceal a stolen program. A birthmark (feature) of an app can be alterable by code obfuscations. Therefore, it is necessary to detect Android app theft based on the birthmark which is resilient to code obfuscation. In this paper, we propose an effective Android app birthmark and app theft detection through the proposed birthmark. By analyzing some obfuscation tools, we have first selected parameter and the return types of methods as an adequate birthmark. Then, we have measured similarity of target apps using the birthmarks extracted from the apps, where some target apps are not obfuscated and the others obfuscated. The measurement results show that our proposed birthmark is effective for detecting Android app theft even though the apps are obfuscated.

A Study on Feature Extraction for Detection for Obfuscated Javascript Based on Executable Code Units (난독화 된 자바스크립트 탐지를 위한 실행 단위 코드 기반 특징 추출에 관한 연구)

  • Kang, Ik-Seon;Cho, Ho-Mook
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2014.07a
    • /
    • pp.79-80
    • /
    • 2014
  • 악성코드 유포를 위해 가장 많이 사용되는 Drive-by-download 공격에는 주로 자바스크립트가 사용되며, 공격자는 탐지 시스템의 우회하고 행위 분석을 어렵게 하기 위해 공격에 사용되는 스크립트를 난독화 한다. 난독화 된 자바스크립트를 탐지하기 위한 여러 기존의 연구들이 있었지만 최소한의 코드가 난독화 되어 있거나 정상 코드와 혼재할 경우 난독화 여부를 판단하기 어려운 한계가 있다. 본 논문에서는 난독화 된 자바 스크립트를 효과적으로 탐지하기 위해 전체 스크립트를 실행 단위 코드로 나눠 분석에 필요한 특징을 효과적으로 추출하는 방법을 제안한다.

  • PDF

Classification of Malware Families Using Hybrid Datasets (하이브리드 데이터셋을 이용한 악성코드 패밀리 분류)

  • Seo-Woo Choi;Myeong-Jin Han;Yeon-Ji Lee;Il-Gu Lee
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.33 no.6
    • /
    • pp.1067-1076
    • /
    • 2023
  • Recently, as variant malware has increased, the scale of cyber hacking incidents is expanding. To respond to intelligent cyberhacking attack, machine learning-based research is actively underway to effectively classify malware families. However, existing classification models have problems where performance deteriorates when the dataset is obfuscated or sparse. In this paper, we propose a hybrid dataset that combines features extracted from ASM files and BYTES files, and evaluate classification performance using FNN. As a result of the experiment, the proposed method showed performance improvement of about 4% compared to a single dataset, and in particular, performance improvement of about 30% for rare families.