• 제목/요약/키워드: Feature selection

검색결과 1,084건 처리시간 0.032초

상호정보량과 Binary Particle Swarm Optimization을 이용한 속성선택 기법 (Feature Selection Method by Information Theory and Particle S warm Optimization)

  • 조재훈;이대종;송창규;전명근
    • 한국지능시스템학회논문지
    • /
    • 제19권2호
    • /
    • pp.191-196
    • /
    • 2009
  • 본 논문에서는 BPSO(Binary Particle Swarm Optimization)방법과 상호정보량을 이용한 속성선택기법을 제안한다. 제안된 방법은 상호정보량을 이용한 후보속성부분집합을 선택하는 단계와 BPSO를 이용한 최적의 속성부분집합을 선택하는 단계로 구성되어 있다. 후보속성부분집합 선택 단계에서는 독립적으로 속성들의 상호정보량을 평가하여 순위별로 설정된 수 만큼 후보속성들을 선택한다. 최적속성부분집합 선택 단계에서는 BPSO를 이용하여 후보속성부분집합에서 최적의 속성부분집합을 탐색한다. BPSO의 목적함수는 분류기의 정확도와 선택된 속성 수를 포함하는 다중목적함수(Multi-Object Function)을 이용하였다. 제안된 기법의 성능을 평가하기 위하여 유전자 데이터를 사용하였으며, 실험결과 기존의 방법들에 비해 우수한 성능을 보임을 알 수 있었다.

A Novel Feature Selection Method in the Categorization of Imbalanced Textual Data

  • Pouramini, Jafar;Minaei-Bidgoli, Behrouze;Esmaeili, Mahdi
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제12권8호
    • /
    • pp.3725-3748
    • /
    • 2018
  • Text data distribution is often imbalanced. Imbalanced data is one of the challenges in text classification, as it leads to the loss of performance of classifiers. Many studies have been conducted so far in this regard. The proposed solutions are divided into several general categories, include sampling-based and algorithm-based methods. In recent studies, feature selection has also been considered as one of the solutions for the imbalance problem. In this paper, a novel one-sided feature selection known as probabilistic feature selection (PFS) was presented for imbalanced text classification. The PFS is a probabilistic method that is calculated using feature distribution. Compared to the similar methods, the PFS has more parameters. In order to evaluate the performance of the proposed method, the feature selection methods including Gini, MI, FAST and DFS were implemented. To assess the proposed method, the decision tree classifications such as C4.5 and Naive Bayes were used. The results of tests on Reuters-21875 and WebKB figures per F-measure suggested that the proposed feature selection has significantly improved the performance of the classifiers.

Biological Feature Selection and Disease Gene Identification using New Stepwise Random Forests

  • Hwang, Wook-Yeon
    • Industrial Engineering and Management Systems
    • /
    • 제16권1호
    • /
    • pp.64-79
    • /
    • 2017
  • Identifying disease genes from human genome is a critical task in biomedical research. Important biological features to distinguish the disease genes from the non-disease genes have been mainly selected based on traditional feature selection approaches. However, the traditional feature selection approaches unnecessarily consider many unimportant biological features. As a result, although some of the existing classification techniques have been applied to disease gene identification, the prediction performance was not satisfactory. A small set of the most important biological features can enhance the accuracy of disease gene identification, as well as provide potentially useful knowledge for biologists or clinicians, who can further investigate the selected biological features as well as the potential disease genes. In this paper, we propose a new stepwise random forests (SRF) approach for biological feature selection and disease gene identification. The SRF approach consists of two stages. In the first stage, only important biological features are iteratively selected in a forward selection manner based on one-dimensional random forest regression, where the updated residual vector is considered as the current response vector. We can then determine a small set of important biological features. In the second stage, random forests classification with regard to the selected biological features is applied to identify disease genes. Our extensive experiments show that the proposed SRF approach outperforms the existing feature selection and classification techniques in terms of biological feature selection and disease gene identification.

A Feature Selection-based Ensemble Method for Arrhythmia Classification

  • Namsrai, Erdenetuya;Munkhdalai, Tsendsuren;Li, Meijing;Shin, Jung-Hoon;Namsrai, Oyun-Erdene;Ryu, Keun Ho
    • Journal of Information Processing Systems
    • /
    • 제9권1호
    • /
    • pp.31-40
    • /
    • 2013
  • In this paper, a novel method is proposed to build an ensemble of classifiers by using a feature selection schema. The feature selection schema identifies the best feature sets that affect the arrhythmia classification. Firstly, a number of feature subsets are extracted by applying the feature selection schema to the original dataset. Then classification models are built by using the each feature subset. Finally, we combine the classification models by adopting a voting approach to form a classification ensemble. The voting approach in our method involves both classification error rate and feature selection rate to calculate the score of the each classifier in the ensemble. In our method, the feature selection rate depends on the extracting order of the feature subsets. In the experiment, we applied our method to arrhythmia dataset and generated three top disjointed feature sets. We then built three classifiers based on the top-three feature subsets and formed the classifier ensemble by using the voting approach. Our method can improve the classification accuracy in high dimensional dataset. The performance of each classifier and the performance of their ensemble were higher than the performance of the classifier that was based on whole feature space of the dataset. The classification performance was improved and a more stable classification model could be constructed with the proposed approach.

Feature Selection Based on Bi-objective Differential Evolution

  • Das, Sunanda;Chang, Chi-Chang;Das, Asit Kumar;Ghosh, Arka
    • Journal of Computing Science and Engineering
    • /
    • 제11권4호
    • /
    • pp.130-141
    • /
    • 2017
  • Feature selection is one of the most challenging problems of pattern recognition and data mining. In this paper, a feature selection algorithm based on an improved version of binary differential evolution is proposed. The method simultaneously optimizes two feature selection criteria, namely, set approximation accuracy of rough set theory and relational algebra based derived score, in order to select the most relevant feature subset from an entire feature set. Superiority of the proposed method over other state-of-the-art methods is confirmed by experimental results, which is conducted over seven publicly available benchmark datasets of different characteristics such as a low number of objects with a high number of features, and a high number of objects with a low number of features.

엔트로피를 기반으로 한 특징 집합 선택 알고리즘 (Feature Subset Selection Algorithm based on Entropy)

  • 홍석미;안종일;정태충
    • 전자공학회논문지CI
    • /
    • 제41권2호
    • /
    • pp.87-94
    • /
    • 2004
  • 특징 집합 선택은 학습 알고리즘의 전처리 과정으로 사용되기도 한다. 수집된 자료가 문제와 관련이 없다거나 중복된 정보를 갖고 있는 경우, 이를 학습 모델생성 이전에 제거함으로써 학습의 성능을 향상시킬 수 있다. 또한 탐색 공간을 감소시킬 수 있으며 저장 공간도 줄일 수 있다. 본 논문에서는 특징 집합의 추출과 추출된 특징 집합의 성능 평가를 위하여 엔트로피를 기반으로 한 휴리스틱 함수를 사용하는 새로운 특징 선택 알고리즘을 제안하였다. 탐색 방법으로는 ACS 알고리즘을 이용하였다. 그 결과 학습에 사용될 특징의 차원을 감소시킴으로써 학습 모델의 크기와 불필요한 계산 시간을 감소시킬 수 있었다.

초분광 영상 특징선택과 밴드비 기법을 이용한 유사색상의 특이재질 검출기법 (Specific Material Detection with Similar Colors using Feature Selection and Band Ratio in Hyperspectral Image)

  • 심민섭;김성호
    • 제어로봇시스템학회논문지
    • /
    • 제19권12호
    • /
    • pp.1081-1088
    • /
    • 2013
  • Hyperspectral cameras acquire reflectance values at many different wavelength bands. Dimensions tend to increase because spectral information is stored in each pixel. Several attempts have been made to reduce dimensional problems such as the feature selection using Adaboost and dimension reduction using the Simulated Annealing technique. We propose a novel material detection method that consists of four steps: feature band selection, feature extraction, SVM (Support Vector Machine) learning, and target and specific region detection. It is a combination of the band ratio method and Simulated Annealing algorithm based on detection rate. The experimental results validate the effectiveness of the proposed feature selection and band ratio method.

Exploring an Optimal Feature Selection Method for Effective Opinion Mining Tasks

  • Eo, Kyun Sun;Lee, Kun Chang
    • 한국컴퓨터정보학회논문지
    • /
    • 제24권2호
    • /
    • pp.171-177
    • /
    • 2019
  • This paper aims to find the most effective feature selection method for the sake of opinion mining tasks. Basically, opinion mining tasks belong to sentiment analysis, which is to categorize opinions of the online texts into positive and negative from a text mining point of view. By using the five product groups dataset such as apparel, books, DVDs, electronics, and kitchen, TF-IDF and Bag-of-Words(BOW) fare calculated to form the product review feature sets. Next, we applied the feature selection methods to see which method reveals most robust results. The results show that the stacking classifier based on those features out of applying Information Gain feature selection method yields best result.

신뢰성 높은 서브밴드 특징벡터 선택을 이용한 잡음에 강인한 화자검증 (Noise Robust Speaker Verification Using Subband-Based Reliable Feature Selection)

  • 김성탁;지미경;김회린
    • 대한음성학회지:말소리
    • /
    • 제63호
    • /
    • pp.125-137
    • /
    • 2007
  • Recently, many techniques have been proposed to improve the noise robustness for speaker verification. In this paper, we consider the feature recombination technique in multi-band approach. In the conventional feature recombination for speaker verification, to compute the likelihoods of speaker models or universal background model, whole feature components are used. This computation method is not effective in a view point of multi-band approach. To deal with non-effectiveness of the conventional feature recombination technique, we introduce a subband likelihood computation, and propose a modified feature recombination using subband likelihoods. In decision step of speaker verification system in noise environments, a few very low likelihood scores of a speaker model or universal background model cause speaker verification system to make wrong decision. To overcome this problem, a reliable feature selection method is proposed. The low likelihood scores of unreliable feature are substituted by likelihood scores of the adaptive noise model. In here, this adaptive noise model is estimated by maximum a posteriori adaptation technique using noise features directly obtained from noisy test speech. The proposed method using subband-based reliable feature selection obtains better performance than conventional feature recombination system. The error reduction rate is more than 31 % compared with the feature recombination-based speaker verification system.

  • PDF

퍼지 클러스터 분석 기반 특징 선택 방법 (A Feature Selection Method Based on Fuzzy Cluster Analysis)

  • 이현숙
    • 정보처리학회논문지B
    • /
    • 제14B권2호
    • /
    • pp.135-140
    • /
    • 2007
  • 특징선택은 문제 영역에서 관찰된 다차원데이터로부터 데이터가 묘사하는 구조를 잘 반영하는 속성을 선택하여 효과적인 실험 데이터를 구성하는 데이터 준비과정이다. 이 과정은 문서분류, 영상인식, 유전자 선택 분야에서의 같은 분류시스템의 성능향상에 중요한 구성요소로서 상관관계 기법, 차원축소 및 상호 정보 처리 등의 통계학이나 정보이론의 접근방법을 중심으로 연구되어왔다. 이와 같은 선택 분야의 연구는 다루는 데이터의 양이 방대해지고 복잡해지면서 더욱 중요시 되고 있다. 본 논문에서는 데이터가 가지는 특성을 반영하면서 새로운 데이터에 대하여 일반화 할 수 있는 특징선택 방법을 제안하고자 한다. 준비된 데이터의 각 속성 데이터에 대하여 퍼지 클러스터 분석에 의하여 최적의 클러스터 정보를 얻고 이를 바탕으로 근접성과 분리성의 경로를 측정하여 그 값에 따라 특징을 선택하는 매카니즘을 제공한다. 제안된 방법을 실세계의 컴퓨터 바이러스 분류에 적용하여 기존의 대비에 의한 휴리스틱 방법에 의해 선택된 데이터를 가지고 분류한 것과 비교하고자 한다. 이를 통하여 주어진 특징에 시연을 부여할 수 있고 효과적으로 특징을 선택하여 시스템의 성능을 향상 시킬 수 있음을 확인한다.