• Title/Summary/Keyword: 특징선택

Search Result 2,047, Processing Time 0.029 seconds

An Enhanced Feature Selection Method Based on the Impurity of Words Considering Unbalanced Distribution of Documents (문서의 불균등 분포를 고려한 단어 불순도 기반 특징 선택 방법)

  • Kang, Jin-Beom;Yang, Jae-Young;Choi, Joong-Min
    • Journal of KIISE:Software and Applications
    • /
    • v.34 no.9
    • /
    • pp.804-816
    • /
    • 2007
  • Sample training data for machine learning often contain irrelevant information or redundant concept. It is also the case that the original data may include noise. If the information collected for constructing learning model is not reliable, it is difficult to obtain accurate information. So the system attempts to find relations or regulations between features and categories in the teaming phase. The feature selection is to remove irrelevant or redundant information before constructing teaming model. for improving its performance. Existing feature selection methods assume that the distribution of documents is balanced in terms of the number of documents for each class and the length of each document. In practice, however, it is difficult not only to prepare a set of documents with almost equal length, but also to define a number of classes with fixed number of document elements. In this paper, we propose a new feature selection method that considers the impurities among the words and unbalanced distribution of documents in categories. We could obtain feature candidates using the word impurity and eventually select the features through unbalanced distribution of documents. We demonstrate that our method performs better than other existing methods via some experiments.

A Feature Selection Method Based on Fuzzy Cluster Analysis (퍼지 클러스터 분석 기반 특징 선택 방법)

  • Rhee, Hyun-Sook
    • The KIPS Transactions:PartB
    • /
    • v.14B no.2
    • /
    • pp.135-140
    • /
    • 2007
  • Feature selection is a preprocessing technique commonly used on high dimensional data. Feature selection studies how to select a subset or list of attributes that are used to construct models describing data. Feature selection methods attempt to explore data's intrinsic properties by employing statistics or information theory. The recent developments have involved approaches like correlation method, dimensionality reduction and mutual information technique. This feature selection have become the focus of much research in areas of applications with massive and complex data sets. In this paper, we provide a feature selection method considering data characteristics and generalization capability. It provides a computational approach for feature selection based on fuzzy cluster analysis of its attribute values and its performance measures. And we apply it to the system for classifying computer virus and compared with heuristic method using the contrast concept. Experimental result shows the proposed approach can give a feature ranking, select the features, and improve the system performance.

Classification of Gene Expression Profiles Using Common Features Selected (공통 선택된 특징을 이용한 유전 발현 데이터의 분류)

  • Park, Chan-Ho;Cho, Sung-Bae
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2002.11a
    • /
    • pp.351-354
    • /
    • 2002
  • 최근 생명공학 기술과 분석화학 기술의 발달로 생물 유전 데이터를 대량으로 얻는 것이 가능하게 되었다. 아울러 이렇게 얻어진 데이터를 적절하게 처리하고 분석하는 방법들도 여러 가지가 소개되어 왔다. 본 논문에서는 DNA 마이크로어레이 정보를 분류하기 위하여 세 가지 데이터에 대하여 여러 가지 특징 전혀 방법으로 선택된 유전자들을 사용하여 신경망 분류기에 적용시켜 보았다. 실험 결과 백혈병 데이터의 경우 피어슨 상관계수를 이용한 분류가 97.1%로 가장 높은 인식률을 보여주었다. 한편 여러 가지 특징 선택 방법에 의하여 공통적으로 선택된 유전자를 사용하여 분류하면 더 높은 인식률이 나올 것 같았지만 실제로는 기대에 못 미치는 성과를 보여주었다. 따라서 무조건 여러 번 선택된 특징을 선택하기 보다는 특징들끼리의 상관관계를 고려하여 선택하는 방법이 필요할 것이다.

  • PDF

Face Feature Selection and Face Recognition using GroupMutual-Boost (GroupMutual-Boost를 이용한 얼굴특징 선택 및 얼굴 인식)

  • Choi, Hak-Jin;Lee, Jong-Sik
    • Journal of the Korea Society for Simulation
    • /
    • v.20 no.4
    • /
    • pp.13-20
    • /
    • 2011
  • The face recognition has been used in a variety fields, such as identification and security. The procedure of the face recognition is as follows; extracting face features of face images, learning the extracted face features, and selecting some features among all extracted face features. The selected features have discrimination and are used for face recognition. However, there are numerous face features extracted from face images. If a face recognition system uses all extracted features, a high computing time is required for learning face features and the efficiency of computing resources decreases. To solve this problem, many researchers have proposed various Boosting methods, which improve the performance of learning algorithms. Mutual-Boost is the typical Boosting method and efficiently selects face features by using mutual information between two features. In this paper, we propose a GroupMutual-Boost method for improving Mutual-Boost. Our proposed method can shorten the time required for learning and recognizing face features and use computing resources more effectively since the method does not learn individual features but a feature group.

Feature Selection for Multi-Class Genre Classification using Gaussian Mixture Model (Gaussian Mixture Model을 이용한 다중 범주 분류를 위한 특징벡터 선택 알고리즘)

  • Moon, Sun-Kuk;Choi, Tack-Sung;Park, Young-Cheol;Youn, Dae-Hee
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.32 no.10C
    • /
    • pp.965-974
    • /
    • 2007
  • In this paper, we proposed the feature selection algorithm for multi-class genre classification. In our proposed algorithm, we developed GMM separation score based on Gaussian mixture model for measuring separability between two genres. Additionally, we improved feature subset selection algorithm based on sequential forward selection for multi-class genre classification. Instead of setting criterion as entire genre separability measures, we set criterion as worst genre separability measure for each sequential selection step. In order to assess the performance proposed algorithm, we extracted various features which represent characteristics such as timbre, rhythm, pitch and so on. Then, we investigate classification performance by GMM classifier and k-NN classifier for selected features using conventional algorithm and proposed algorithm. Proposed algorithm showed improved performance in classification accuracy up to 10 percent for classification experiments of low dimension feature vector especially.

False Minutiae Filtering Algorithm for Fingerprint Identification System (자동 지문 인식을 위한 의사 특징점 제거 알고리즘)

  • Yang, Ji-Sung;Ahn, Do-Sung;Kim, Hak-Il
    • Proceedings of the KIEE Conference
    • /
    • 1999.11c
    • /
    • pp.807-811
    • /
    • 1999
  • 자동 지문 인식을 위한 특징점 추출 과정에서 얻은 특징점에는 지문 획득시 발생하는 잡음과 전처리 과정으로 인한 정보의 손실에 의해 상당량의 의사 특징점이 포함되어 있다. 본 논문에서는 특징점들로 구성된 지문의 특징량에서 잡음이라고 할 수 있는 의사 특징점을 제거하는 알고리즘을 제안한다. 제안하는 알고리즘은 후보 특징점 목록에서 세선화된 지문 화상의 구조적 특성을 고려하여 복원 가능 영역에 속하고 의사 특징점이라고 간주되는 특징점을 선택한다. 이와 같이 선택된 특징점이 세선화 화상에 위치하는 영역은 잡음에 의해 잘못 세선화된 부분이기 때문에 해당 영역을 올바르게 재구성하고 후보 특징점 목록에서 선택한 특징점을 삭제한다. 재구성된 세선화 화상에서 지문 원화상의 부영역별 방향과 지문의 구조적 특성을 근거로 후보 특징점이 위치한 영역의 패턴을 검사하여 진짜 특징점만을 선택함으로써 의사 특징점을 제거하게 된다. NIST sdb 14의 지문 화상을 알고리즘에 적용한 결과는 정추출율 손실 대비 높은 오추출율 개선을 얻었음을 보여주고 있다.

  • PDF

A Diagnostic Feature Subset Selection of Breast Tumor Based on Neighborhood Rough Set Model (Neighborhood 러프집합 모델을 활용한 유방 종양의 진단적 특징 선택)

  • Son, Chang-Sik;Choi, Rock-Hyun;Kang, Won-Seok;Lee, Jong-Ha
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.21 no.6
    • /
    • pp.13-21
    • /
    • 2016
  • Feature selection is the one of important issue in the field of data mining and machine learning. It is the technique to find a subset of features which provides the best classification performance, from the source data. We propose a feature subset selection method using the neighborhood rough set model based on information granularity. To demonstrate the effectiveness of proposed method, it was applied to select the useful features associated with breast tumor diagnosis of 298 shape features extracted from 5,252 breast ultrasound images, which include 2,745 benign and 2,507 malignant cases. Experimental results showed that 19 diagnostic features were strong predictors of breast cancer diagnosis and then average classification accuracy was 97.6%.

A Feature Selection for the Recognition of Handwritten Characters based on Two-Dimensional Wavelet Packet (2차원 웨이브렛 패킷에 기반한 필기체 문자인식의 특징선택방법)

  • Kim, Min-Soo;Back, Jang-Sun;Lee, Guee-Sang;Kim, Soo-Hyung
    • Journal of KIISE:Software and Applications
    • /
    • v.29 no.8
    • /
    • pp.521-528
    • /
    • 2002
  • We propose a new approach to the feature selection for the classification of handwritten characters using two-dimensional(2D) wavelet packet bases. To extract key features of an image data, for the dimension reduction Principal Component Analysis(PCA) has been most frequently used. However PCA relies on the eigenvalue system, it is not only sensitive to outliers and perturbations, but has a tendency to select only global features. Since the important features for the image data are often characterized by local information such as edges and spikes, PCA does not provide good solutions to such problems. Also solving an eigenvalue system usually requires high cost in its computation. In this paper, the original data is transformed with 2D wavelet packet bases and the best discriminant basis is searched, from which relevant features are selected. In contrast to PCA solutions, the fast selection of detailed features as well as global features is possible by virtue of the good properties of wavelets. Experiment results on the recognition rates of PCA and our approach are compared to show the performance of the proposed method.

An Enhanced Feature Select ion Method using the Impurity of Words (단어의 불순도를 고려한 특징 선택 방법 연구)

  • Kang, Jin-Beom;Yang, Jae-Young;Choi, Joong-Min
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2005.11b
    • /
    • pp.679-681
    • /
    • 2005
  • 효과적인 문서 분류를 위해 학습 하고자 하는 클래스와 관련된 많은 특징들이 필요하다. 하지만 학습하고자 하는 개념과 관련이 없거나 중복된 정보가 수집된 정보 속에 존재한다. 학습 과정에서 정확한 지식 습득을 하기 위해 특징 선택 방법을 사용하였다. 본 논문에서는 클래스에 대한 단어의 불순도를 이용한 특징 선택 방법을 제안한다. 기존의 특징 선택 방법과 비교 분석하여 기존 특징 선택 방법의 문제점을 파악하고 개선된 기법을 보인다.

  • PDF

An Experimental Comparison of Feature Subset Selection Methods using Bio-Inspired Algorithms (생태계 모방 알고리즘을 이용한 특징 선택 방법들의 성능 비교 분석에 대한 연구)

  • Yun, Chulmin;Yang, Jihoon
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2007.11a
    • /
    • pp.27-29
    • /
    • 2007
  • 패턴 인식 문제를 푸는데 있어 특징 선택을 해주는 것은 패턴 인식의 성능 향상을 위해 중요한 과정 중 하나이다. 본 연구에서는 대표적인 생태계 모방 알고리즘 2 가지를 선택하여 특징 선택 문제에 적용하여 보고, 그 성능을 비교 분석하였다. 데이터의 특징을 줄여주는 기능과 패턴 인식 성능의 향상 여부를 중심으로 평가하였으며, 이를 통해 생태계 모방 알고리즘이 특징 선택 문제에 효과적으로 사용될 수 있는지에 대해 논의해보고, 두 방법의 장단점과 특징에 대해 생각해 본다.