• 제목/요약/키워드: Feature Evaluation and Selection

검색결과 87건 처리시간 0.023초

Computer Aided Diagnosis System based on Performance Evaluation Agent Model

  • Rhee, Hyun-Sook
    • 한국컴퓨터정보학회논문지
    • /
    • 제21권1호
    • /
    • pp.9-16
    • /
    • 2016
  • In this paper, we present a performance evaluation agent based on fuzzy cluster analysis and validity measures. The proposed agent is consists of three modules, fuzzy cluster analyzer, performance evaluation measures, and feature ranking algorithm for feature selection step in CAD system. Feature selection is an important step commonly used to create more accurate system to help human experts. Through this agent, we get the feature ranking on the dataset of mass and calcification lesions extracted from the public real world mammogram database DDSM. Also we design a CAD system incorporating the agent and apply five different feature combinations to the system. Experimental results proposed approach has higher classification accuracy and shows the feasibility as a diagnosis supporting tool.

다차원 데이터 평가가 가능한 개선된 FSDD 연구 (An Improvement of FSDD for Evaluating Multi-Dimensional Data)

  • 오세종
    • 디지털융복합연구
    • /
    • 제15권1호
    • /
    • pp.247-253
    • /
    • 2017
  • 피처선택, 혹은 변수 선택은 피처의 수가 매우 많은 고차원 데이터에서 주어진 주제와 연관성이 높은 피처를 선별하는 과정으로서, 데이터의 차원수를 낮추어 군집분석이나 분류 분석 등을 용이하게 하는데 중요한 기법이다. 많은 수의 피처들 중에서 일부의 피처를 선별하기 위해서는 피처들을 평가하기 위한 도구가 필요하다. 현재까지 제안된 도구들은 대부분 확률이론이나 정보이론에 기초하여 만들어졌기 때문에 하나의 피처, 즉 1차원 데이터만을 평가할 수 있다. 그러나 피처들 간에는 상호작용이 있기 때문에 하나의 피처를 평가하기 보다는 여러 피처들의 집합, 즉 다차원 데이터를 평가할 수 있어야 효과적인 피처 선택이 가능하다. 본 연구에서는 확장된 거리 함수를 이용하여 1차원 데이터 평가용으로 제안된 FSDD 평가 함수를 다차원 데이터에 대한 평가가 가능하도록 개선하는 방법에 대해 제안하였다. 본 연구에서 제안한 접근법은 다른 1차원 데이터 평가함수에도 적용이 될 수 있을 것으로 기대된다.

Performance Evaluation of a Feature-Importance-based Feature Selection Method for Time Series Prediction

  • Hyun, Ahn
    • Journal of information and communication convergence engineering
    • /
    • 제21권1호
    • /
    • pp.82-89
    • /
    • 2023
  • Various machine-learning models may yield high predictive power for massive time series for time series prediction. However, these models are prone to instability in terms of computational cost because of the high dimensionality of the feature space and nonoptimized hyperparameter settings. Considering the potential risk that model training with a high-dimensional feature set can be time-consuming, we evaluate a feature-importance-based feature selection method to derive a tradeoff between predictive power and computational cost for time series prediction. We used two machine learning techniques for performance evaluation to generate prediction models from a retail sales dataset. First, we ranked the features using impurity- and Local Interpretable Model-agnostic Explanations (LIME) -based feature importance measures in the prediction models. Then, the recursive feature elimination method was applied to eliminate unimportant features sequentially. Consequently, we obtained a subset of features that could lead to reduced model training time while preserving acceptable model performance.

Feature Impact Evaluation Based Pattern Classification System

  • Rhee, Hyun-Sook
    • 한국컴퓨터정보학회논문지
    • /
    • 제23권11호
    • /
    • pp.25-30
    • /
    • 2018
  • Pattern classification system is often an important component of intelligent systems. In this paper, we present a pattern classification system consisted of the feature selection module, knowledge base construction module and decision module. We introduce a feature impact evaluation selection method based on fuzzy cluster analysis considering computational approach and generalization capability of given data characteristics. A fuzzy neural network, OFUN-NET based on unsupervised learning data mining technique produces knowledge base for representative clusters. 240 blemish pattern images are prepared and applied to the proposed system. Experimental results show the feasibility of the proposed classification system as an automating defect inspection tool.

Biological Feature Selection and Disease Gene Identification using New Stepwise Random Forests

  • Hwang, Wook-Yeon
    • Industrial Engineering and Management Systems
    • /
    • 제16권1호
    • /
    • pp.64-79
    • /
    • 2017
  • Identifying disease genes from human genome is a critical task in biomedical research. Important biological features to distinguish the disease genes from the non-disease genes have been mainly selected based on traditional feature selection approaches. However, the traditional feature selection approaches unnecessarily consider many unimportant biological features. As a result, although some of the existing classification techniques have been applied to disease gene identification, the prediction performance was not satisfactory. A small set of the most important biological features can enhance the accuracy of disease gene identification, as well as provide potentially useful knowledge for biologists or clinicians, who can further investigate the selected biological features as well as the potential disease genes. In this paper, we propose a new stepwise random forests (SRF) approach for biological feature selection and disease gene identification. The SRF approach consists of two stages. In the first stage, only important biological features are iteratively selected in a forward selection manner based on one-dimensional random forest regression, where the updated residual vector is considered as the current response vector. We can then determine a small set of important biological features. In the second stage, random forests classification with regard to the selected biological features is applied to identify disease genes. Our extensive experiments show that the proposed SRF approach outperforms the existing feature selection and classification techniques in terms of biological feature selection and disease gene identification.

패턴 인식에서 특징 선택을 위한 개미 군락 최적화 (Ant Colony Optimization for Feature Selection in Pattern Recognition)

  • 오일석;이진선
    • 한국콘텐츠학회논문지
    • /
    • 제10권5호
    • /
    • pp.1-9
    • /
    • 2010
  • 이 논문은 특징 선택에 사용되는 개미 군락 최적화의 수렴 특성을 개선하기 위해 선택적 평가라는 새로운 기법을 제시한다. 이 방법은 불필요하거나 가능성이 덜한 후보 해를 배제함으로써 계산량을 줄인다. 이 방법은, 그런 해를 찾아내는데 사용할 수 있는 페로몬 정보 때문에 구현이 가능하다. 문제 크기에 따른 알고리즘의 적용가능성을 판단할 목적으로, 특징 선택에 사용되는 세 가지 알고리즘인 탐욕 알고리즘, 유전 알고리즘, 그리고 개미 군락 최적화의 계산 시간을 분석한다. 엄밀한 분석을 위해 원자 연산이라는 개념을 사용한다. 실험 결과는 선택적 평가를 채택한 개미 군락 최적화가 계산 시간과 인식 성능 모두에서 우수함을 보여준다.

Feature-Based Relation Classification Using Quantified Relatedness Information

  • Huang, Jin-Xia;Choi, Key-Sun;Kim, Chang-Hyun;Kim, Young-Kil
    • ETRI Journal
    • /
    • 제32권3호
    • /
    • pp.482-485
    • /
    • 2010
  • Feature selection is very important for feature-based relation classification tasks. While most of the existing works on feature selection rely on linguistic information acquired using parsers, this letter proposes new features, including probabilistic and semantic relatedness features, to manifest the relatedness between patterns and certain relation types in an explicit way. The impact of each feature set is evaluated using both a chi-square estimator and a performance evaluation. The experiments show that the impact of relatedness features is superior to existing well-known linguistic features, and the contribution of relatedness features cannot be substituted using other normally used linguistic feature sets.

기계학습 기반의 실시간 악성코드 탐지를 위한 최적 특징 선택 방법 (An Optimal Feature Selection Method to Detect Malwares in Real Time Using Machine Learning)

  • 주진걸;정인선;강승호
    • 한국멀티미디어학회논문지
    • /
    • 제22권2호
    • /
    • pp.203-209
    • /
    • 2019
  • The performance of an intelligent classifier for detecting malwares added to multimedia contents based on machine learning is highly dependent on the properties of feature set. Especially, in order to determine the malicious code in real time the size of feature set should be as short as possible without reducing the accuracy. In this paper, we introduce an optimal feature selection method to satisfy both high detection rate and the minimum length of feature set against the feature set provided by PEFeatureExtractor well known as a feature extraction tool. For the evaluation of the proposed method, we perform the experiments using Windows Portable Executables 32bits.

Conditional Mutual Information-Based Feature Selection Analyzing for Synergy and Redundancy

  • Cheng, Hongrong;Qin, Zhiguang;Feng, Chaosheng;Wang, Yong;Li, Fagen
    • ETRI Journal
    • /
    • 제33권2호
    • /
    • pp.210-218
    • /
    • 2011
  • Battiti's mutual information feature selector (MIFS) and its variant algorithms are used for many classification applications. Since they ignore feature synergy, MIFS and its variants may cause a big bias when features are combined to cooperate together. Besides, MIFS and its variants estimate feature redundancy regardless of the corresponding classification task. In this paper, we propose an automated greedy feature selection algorithm called conditional mutual information-based feature selection (CMIFS). Based on the link between interaction information and conditional mutual information, CMIFS takes account of both redundancy and synergy interactions of features and identifies discriminative features. In addition, CMIFS combines feature redundancy evaluation with classification tasks. It can decrease the probability of mistaking important features as redundant features in searching process. The experimental results show that CMIFS can achieve higher best-classification-accuracy than MIFS and its variants, with the same or less (nearly 50%) number of features.

특징 선택과 융합 방법을 이용한 음성 감정 인식 (Speech Emotion Recognition using Feature Selection and Fusion Method)

  • 김원구
    • 전기학회논문지
    • /
    • 제66권8호
    • /
    • pp.1265-1271
    • /
    • 2017
  • In this paper, the speech parameter fusion method is studied to improve the performance of the conventional emotion recognition system. For this purpose, the combination of the parameters that show the best performance by combining the cepstrum parameters and the various pitch parameters used in the conventional emotion recognition system are selected. Various pitch parameters were generated using numerical and statistical methods using pitch of speech. Performance evaluation was performed on the emotion recognition system using Gaussian mixture model(GMM) to select the pitch parameters that showed the best performance in combination with cepstrum parameters. As a parameter selection method, sequential feature selection method was used. In the experiment to distinguish the four emotions of normal, joy, sadness and angry, fifteen of the total 56 pitch parameters were selected and showed the best recognition performance when fused with cepstrum and delta cepstrum coefficients. This is a 48.9% reduction in the error of emotion recognition system using only pitch parameters.