• 제목/요약/키워드: Feature select

검색결과 372건 처리시간 0.023초

A Self-selection of Adaptive Feature using DCT

  • Lim, Seung-in
    • 한국컴퓨터정보학회논문지
    • /
    • 제5권3호
    • /
    • pp.215-219
    • /
    • 2000
  • The purpose of this paper is to propose a method to maximize the efficiency of a content-based image retrieval for various kinds of images. This paper discuss the self-adaptivity for the change of image domain and the self-selection of optimal features for query image, and present the efficient method to maximize content-based retrieval for various kinds of images. In this method, a content-based retrieval system is adopted to select automatically distinctive feature patterns which have a maximum efficiency of image retrieval in various kinds of images. Experimental results show that the Proposed method is improved 3% than the method using individual features.

  • PDF

안드로이드 기반 앱 악성코드 탐지를 위한 Feature 선정 및 학습모델 제안 (Suggestion of Selecting features and learning models for Android-based App Malware Detection)

  • 배세진;이정수;백남균
    • 한국정보통신학회:학술대회논문집
    • /
    • 한국정보통신학회 2022년도 춘계학술대회
    • /
    • pp.377-380
    • /
    • 2022
  • 앱(App)이라 불리는 응용프로그램은 모바일 기기 등에 다운받아 사용 가능하다. 그 중 안드로이드(Android) 기반 앱은 오픈소스 기반으로 구현되어 누구나 악용 가능하다는 단점이 있지만, 아주 일부분의 소스코드를 공개하는 iOS와는 달리 안드로이드는 오픈소스로 구현되어있기 때문에 코드를 분석할 수 있다는 장점도 있다. 하지만, 오픈소스 기반의 안드로이드 앱은 누구나 소스코드 변경에 참여 가능하기 때문에 그만큼 악성코드가 많아지고 종류 또한 다양해질 수밖에 없다. 단기간에 기하급수적으로 늘어나는 악성코드는 사람이 일일이 탐지하기 어려워 AI를 활용하여 악성코드를 탐지하는 기법을 사용하는 것이 효율적이다. 기존 대부분의 악성 앱 탐지 방안은 Feature를 추출하여 악성 앱을 탐지하는 방안이 대부분이다. 따라서 Feature 추출 후 학습에 사용할 최적의 Feature를 선정(Selection)하는 3가지 방안을 제안한다. 마지막으로, 최적의 Feature로 모델링을 하는 단계에서 단일 모델 이외에도 앙상블 기법을 사용한다. 앙상블 기법은 이미 여러 연구에서 나와 있듯이 단일 모델의 성능을 뛰어넘는 결과를 보여주고 있다. 따라서 본 논문에서는 안드로이드 앱(App) 기반 악성코드 탐지 최적의 Feature 선정과 학습모델을 구현하는 방안을 제시한다.

  • PDF

CAD 모델 재사용을 위한 특징형상기반 유사도 측정에 관한 연구 (Feature-based Similarity Assessment for Re-using CAD Models)

  • 박병건;김재정
    • 한국CDE학회논문집
    • /
    • 제16권1호
    • /
    • pp.21-30
    • /
    • 2011
  • Similarity assessment of a CAD model is one of important issues from the aspect of model re-using. In real practice, many new mechanical parts are designed by modifying existing ones. The reuse of part enables to save design time and efforts for the designers. Design time would be further reduced if there were an efficient way to search for existing similar designs. This paper proposes an efficient algorithm of similarity assessment for mechanical part model with design history embedded within the CAD model. Since it is possible to retrieve the design history and detailed-feature information using CAD API, we can obtain an accurate and reliable assessment result. For our purpose, our assessment algorithm can be divided by two: (1) we select suitable parts by comparing MSG (Model Signature Graph) extracted from a base feature of the required model; (2) detailed-features' similarities are assessed with their own attributes and reference structures. In addition, we also propose a indexing method for managing a model database in the last part of this article.

Hybrid Feature Selection Method Based on a Naïve Bayes Algorithm that Enhances the Learning Speed while Maintaining a Similar Error Rate in Cyber ISR

  • Shin, GyeongIl;Yooun, Hosang;Shin, DongIl;Shin, DongKyoo
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제12권12호
    • /
    • pp.5685-5700
    • /
    • 2018
  • Cyber intelligence, surveillance, and reconnaissance (ISR) has become more important than traditional military ISR. An agent used in cyber ISR resides in an enemy's networks and continually collects valuable information. Thus, this agent should be able to determine what is, and is not, useful in a short amount of time. Moreover, the agent should maintain a classification rate that is high enough to select useful data from the enemy's network. Traditional feature selection algorithms cannot comply with these requirements. Consequently, in this paper, we propose an effective hybrid feature selection method derived from the filter and wrapper methods. We illustrate the design of the proposed model and the experimental results of the performance comparison between the proposed model and the existing model.

Feature Selection via Embedded Learning Based on Tangent Space Alignment for Microarray Data

  • Ye, Xiucai;Sakurai, Tetsuya
    • Journal of Computing Science and Engineering
    • /
    • 제11권4호
    • /
    • pp.121-129
    • /
    • 2017
  • Feature selection has been widely established as an efficient technique for microarray data analysis. Feature selection aims to search for the most important feature/gene subset of a given dataset according to its relevance to the current target. Unsupervised feature selection is considered to be challenging due to the lack of label information. In this paper, we propose a novel method for unsupervised feature selection, which incorporates embedded learning and $l_{2,1}-norm$ sparse regression into a framework to select genes in microarray data analysis. Local tangent space alignment is applied during embedded learning to preserve the local data structure. The $l_{2,1}-norm$ sparse regression acts as a constraint to aid in learning the gene weights correlatively, by which the proposed method optimizes for selecting the informative genes which better capture the interesting natural classes of samples. We provide an effective algorithm to solve the optimization problem in our method. Finally, to validate the efficacy of the proposed method, we evaluate the proposed method on real microarray gene expression datasets. The experimental results demonstrate that the proposed method obtains quite promising performance.

Study of Machine-Learning Classifier and Feature Set Selection for Intent Classification of Korean Tweets about Food Safety

  • Yeom, Ha-Neul;Hwang, Myunggwon;Hwang, Mi-Nyeong;Jung, Hanmin
    • Journal of Information Science Theory and Practice
    • /
    • 제2권3호
    • /
    • pp.29-39
    • /
    • 2014
  • In recent years, several studies have proposed making use of the Twitter micro-blogging service to track various trends in online media and discussion. In this study, we specifically examine the use of Twitter to track discussions of food safety in the Korean language. Given the irregularity of keyword use in most tweets, we focus on optimistic machine-learning and feature set selection to classify collected tweets. We build the classifier model using Naive Bayes & Naive Bayes Multinomial, Support Vector Machine, and Decision Tree Algorithms, all of which show good performance. To select an optimum feature set, we construct a basic feature set as a standard for performance comparison, so that further test feature sets can be evaluated. Experiments show that precision and F-measure performance are best when using a Naive Bayes Multinomial classifier model with a test feature set defined by extracting Substantive, Predicate, Modifier, and Interjection parts of speech.

Iris Recognition Based on a Shift-Invariant Wavelet Transform

  • Cho, Seongwon;Kim, Jaemin
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • 제4권3호
    • /
    • pp.322-326
    • /
    • 2004
  • This paper describes a new iris recognition method based on a shift-invariant wavelet sub-images. For the feature representation, we first preprocess an iris image for the compensation of the variation of the iris and for the easy implementation of the wavelet transform. Then, we decompose the preprocessed iris image into multiple subband images using a shift-invariant wavelet transform. For feature representation, we select a set of subband images, which have rich information for the classification of various iris patterns and robust to noises. In order to reduce the size of the feature vector, we quantize. each pixel of subband images using the Lloyd-Max quantization method Each feature element is represented by one of quantization levels, and a set of these feature element is the feature vector. When the quantization is very coarse, the quantized level does not have much information about the image pixel value. Therefore, we define a new similarity measure based on mutual information between two features. With this similarity measure, the size of the feature vector can be reduced without much degradation of performance. Experimentally, we show that the proposed method produced superb performance in iris recognition.

CNN-based Android Malware Detection Using Reduced Feature Set

  • Kim, Dong-Min;Lee, Soo-jin
    • 한국컴퓨터정보학회논문지
    • /
    • 제26권10호
    • /
    • pp.19-26
    • /
    • 2021
  • 딥러닝 기반 악성코드 탐지 및 분류모델의 성능은 특성집합을 어떻게 구성하느냐에 따라 크게 좌우된다. 본 논문에서는 CNN 기반의 안드로이드 악성코드 탐지 시 탐지성능을 극대화할 수 있는 최적의 특성집합(feature set)을 선정하는 방법을 제안한다. 특성집합에 포함될 특성은 기계학습 및 딥러닝에서 특성추출을 위해 널리 사용되는 Chi-Square test 알고리즘을 사용하여 선정하였다. CICANDMAL2017 데이터세트를 대상으로 선정된 36개의 특성을 이용하여 CNN 모델을 학습시킨 후 악성코드 탐지성능을 측정한 결과 이진분류에서는 99.99%, 다중분류에서는 98.55%의 Accuracy를 달성하였다.

A Novel Network Anomaly Detection Method based on Data Balancing and Recursive Feature Addition

  • Liu, Xinqian;Ren, Jiadong;He, Haitao;Wang, Qian;Sun, Shengting
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제14권7호
    • /
    • pp.3093-3115
    • /
    • 2020
  • Network anomaly detection system plays an essential role in detecting network anomaly and ensuring network security. Anomaly detection system based machine learning has become an increasingly popular solution. However, due to the unbalance and high-dimension characteristics of network traffic, the existing methods unable to achieve the excellent performance of high accuracy and low false alarm rate. To address this problem, a new network anomaly detection method based on data balancing and recursive feature addition is proposed. Firstly, data balancing algorithm based on improved KNN outlier detection is designed to select part respective data on each category. Combination optimization about parameters of improved KNN outlier detection is implemented by genetic algorithm. Next, recursive feature addition algorithm based on correlation analysis is proposed to select effective features, in which a cross contingency test is utilized to analyze correlation and obtain a features subset with a strong correlation. Then, random forests model is as the classification model to detection anomaly. Finally, the proposed algorithm is evaluated on benchmark datasets KDD Cup 1999 and UNSW_NB15. The result illustrates the proposed strategies enhance accuracy and recall, and decrease the false alarm rate. Compared with other algorithms, this algorithm still achieves significant effects, especially recall in the small category.

Big 5 성격 요소와 머신 러닝 알고리즘을 통한 창의적인 사람들의 특징 연구 (Feature Selection for Creative People Based on Big 5 Personality traits and Machine Learning Algorithms)

  • 김용준
    • 한국인터넷방송통신학회논문지
    • /
    • 제19권1호
    • /
    • pp.97-102
    • /
    • 2019
  • 창의적인 사람에 대한 정확한 기준이나 수치화를 사용하여 체계적인 분류와 분석 방법이 없었기에 정의하는 데에 어려움이 많다. 이 문제를 해결하기 위하여 본 연구에서는 창의적인 사람을 어떻게 구분 지을 수 있을지에 대한 것과 어떤 유사한 성격이 있는지 분석한다. 본 연구에서 우선 Big 5 성격 특성 기법을 이용하여 설문조사를 진행하고, 그 설문조사로 얻은 데이터 세트를 가지고 데이터 마이닝 도구인 WEKA를 이용하여 데이터 세트를 분류하고 분석한 뒤, 창의적인 사람들과 연관성 있는 성격 특징들을 다양한 머신 러닝 기법을 이용하여 분석하는 것을 목표로 진행하였다. 7개의 특징 선택 알고리즘을 활용하고, 특징 선택 알고리즘들로 분류된 특징 집단을 선택하여 머신 러닝 알고리즘에 적용하여 정확도를 알아냈고, 서포트 벡터 머신을 통해 나온 특징이 가장 높은 분류 결과를 도출하였다.