• Title/Summary/Keyword: Optimal Feature Selection

Search Result 134, Processing Time 0.028 seconds

The Optimal Bispectral Feature Vectors and the Fuzzy Classifier for 2D Shape Classification

  • Youngwoon Woo;Soowhan Han;Park, Choong-Shik
    • Proceedings of the Korea Inteligent Information System Society Conference
    • /
    • 2001.01a
    • /
    • pp.421-427
    • /
    • 2001
  • In this paper, a method for selection of the optimal feature vectors is proposed for the classification of closed 2D shapes using the bispectrum of a contour sequence. The bispectrum based on third order cumulants is applied to the contour sequences of the images to extract feature vectors for each planar image. These bispectral feature vectors, which are invariant to shape translation, rotation and scale transformation, can be used to represent two-dimensional planar images, but there is no certain criterion on the selection of the feature vectors for optimal classification of closed 2D images. In this paper, a new method for selecting the optimal bispectral feature vectors based on the variances of the feature vectors. The experimental results are presented using eight different shapes of aircraft images, the feature vectors of the bispectrum from five to fifteen and an weighted mean fuzzy classifier.

  • PDF

ModifiedFAST: A New Optimal Feature Subset Selection Algorithm

  • Nagpal, Arpita;Gaur, Deepti
    • Journal of information and communication convergence engineering
    • /
    • v.13 no.2
    • /
    • pp.113-122
    • /
    • 2015
  • Feature subset selection is as a pre-processing step in learning algorithms. In this paper, we propose an efficient algorithm, ModifiedFAST, for feature subset selection. This algorithm is suitable for text datasets, and uses the concept of information gain to remove irrelevant and redundant features. A new optimal value of the threshold for symmetric uncertainty, used to identify relevant features, is found. The thresholds used by previous feature selection algorithms such as FAST, Relief, and CFS were not optimal. It has been proven that the threshold value greatly affects the percentage of selected features and the classification accuracy. A new performance unified metric that combines accuracy and the number of features selected has been proposed and applied in the proposed algorithm. It was experimentally shown that the percentage of selected features obtained by the proposed algorithm was lower than that obtained using existing algorithms in most of the datasets. The effectiveness of our algorithm on the optimal threshold was statistically validated with other algorithms.

Feature Selection Method by Information Theory and Particle S warm Optimization (상호정보량과 Binary Particle Swarm Optimization을 이용한 속성선택 기법)

  • Cho, Jae-Hoon;Lee, Dae-Jong;Song, Chang-Kyu;Chun, Myung-Geun
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.19 no.2
    • /
    • pp.191-196
    • /
    • 2009
  • In this paper, we proposed a feature selection method using Binary Particle Swarm Optimization(BPSO) and Mutual information. This proposed method consists of the feature selection part for selecting candidate feature subset by mutual information and the optimal feature selection part for choosing optimal feature subset by BPSO in the candidate feature subsets. In the candidate feature selection part, we computed the mutual information of all features, respectively and selected a candidate feature subset by the ranking of mutual information. In the optimal feature selection part, optimal feature subset can be found by BPSO in the candidate feature subset. In the BPSO process, we used multi-object function to optimize both accuracy of classifier and selected feature subset size. DNA expression dataset are used for estimating the performance of the proposed method. Experimental results show that this method can achieve better performance for pattern recognition problems than conventional ones.

Optimal Gabor Filters for Steganalysis of Content-Adaptive JPEG Steganography

  • Song, Xiaofeng;Liu, Fenlin;Chen, Liju;Yang, Chunfang;Luo, Xiangyang
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.11 no.1
    • /
    • pp.552-569
    • /
    • 2017
  • The existing steganalysis method based on 2D Gabor filters can achieve a competitive detection performance for content-adaptive JPEG steganography. However, the feature dimensionality is still high and the time-consuming of feature extraction is relatively large because the optimal selection is not performed for 2D Gabor filters. To solve this problem, a new steganalysis method is proposed for content-adaptive JPEG steganography by selecting the optimal 2D Gabor filters. For the proposed method, the 2D Gabor filters with different parameter settings are generated first. Then, the feature is extracted by each 2D Gabor filter and the corresponding detection accuracy is used as the measure for filter selection. Next, some 2D Gabor filters are selected by a greedy strategy and the steganalysis feature is extracted by the selected filters. Last, the ensemble classifier is used to assemble the proposed steganalysis feature as well as the final steganalyzer. The experimental results show that the steganalysis feature extracted by the selected optimal 2D Gabor filters also can achieve a competitive detection performance while the feature dimensionality is reduced greatly.

Landslide susceptibility assessment using feature selection-based machine learning models

  • Liu, Lei-Lei;Yang, Can;Wang, Xiao-Mi
    • Geomechanics and Engineering
    • /
    • v.25 no.1
    • /
    • pp.1-16
    • /
    • 2021
  • Machine learning models have been widely used for landslide susceptibility assessment (LSA) in recent years. The large number of inputs or conditioning factors for these models, however, can reduce the computation efficiency and increase the difficulty in collecting data. Feature selection is a good tool to address this problem by selecting the most important features among all factors to reduce the size of the input variables. However, two important questions need to be solved: (1) how do feature selection methods affect the performance of machine learning models? and (2) which feature selection method is the most suitable for a given machine learning model? This paper aims to address these two questions by comparing the predictive performance of 13 feature selection-based machine learning (FS-ML) models and 5 ordinary machine learning models on LSA. First, five commonly used machine learning models (i.e., logistic regression, support vector machine, artificial neural network, Gaussian process and random forest) and six typical feature selection methods in the literature are adopted to constitute the proposed models. Then, fifteen conditioning factors are chosen as input variables and 1,017 landslides are used as recorded data. Next, feature selection methods are used to obtain the importance of the conditioning factors to create feature subsets, based on which 13 FS-ML models are constructed. For each of the machine learning models, a best optimized FS-ML model is selected according to the area under curve value. Finally, five optimal FS-ML models are obtained and applied to the LSA of the studied area. The predictive abilities of the FS-ML models on LSA are verified and compared through the receive operating characteristic curve and statistical indicators such as sensitivity, specificity and accuracy. The results showed that different feature selection methods have different effects on the performance of LSA machine learning models. FS-ML models generally outperform the ordinary machine learning models. The best FS-ML model is the recursive feature elimination (RFE) optimized RF, and RFE is an optimal method for feature selection.

An Optimal Feature Selection Method to Detect Malwares in Real Time Using Machine Learning (기계학습 기반의 실시간 악성코드 탐지를 위한 최적 특징 선택 방법)

  • Joo, Jin-Gul;Jeong, In-Seon;Kang, Seung-Ho
    • Journal of Korea Multimedia Society
    • /
    • v.22 no.2
    • /
    • pp.203-209
    • /
    • 2019
  • The performance of an intelligent classifier for detecting malwares added to multimedia contents based on machine learning is highly dependent on the properties of feature set. Especially, in order to determine the malicious code in real time the size of feature set should be as short as possible without reducing the accuracy. In this paper, we introduce an optimal feature selection method to satisfy both high detection rate and the minimum length of feature set against the feature set provided by PEFeatureExtractor well known as a feature extraction tool. For the evaluation of the proposed method, we perform the experiments using Windows Portable Executables 32bits.

Rough Entropy-based Knowledge Reduction using Rough Set Theory (러프집합 이론을 이용한 러프 엔트로피 기반 지식감축)

  • Park, In-Kyoo
    • Journal of Digital Convergence
    • /
    • v.12 no.6
    • /
    • pp.223-229
    • /
    • 2014
  • In an attempt to retrieve useful information for an efficient decision in the large knowledge system, it is generally necessary and important for a refined feature selection. Rough set has difficulty in generating optimal reducts and classifying boundary objects. In this paper, we propose quick reduction algorithm generating optimal features by rough entropy analysis for condition and decision attributes to improve these restrictions. We define a new conditional information entropy for efficient feature extraction and describe procedure of feature selection to classify the significance of features. Through the simulation of 5 datasets from UCI storage, we compare our feature selection approach based on rough set theory with the other selection theories. As the result, our modeling method is more efficient than the previous theories in classification accuracy for feature selection.

Identification of Chinese Event Types Based on Local Feature Selection and Explicit Positive & Negative Feature Combination

  • Tan, Hongye;Zhao, Tiejun;Wang, Haochang;Hong, Wan-Pyo
    • Journal of information and communication convergence engineering
    • /
    • v.5 no.3
    • /
    • pp.233-238
    • /
    • 2007
  • An approach to identify Chinese event types is proposed in this paper which combines a good feature selection policy and a Maximum Entropy (ME) model. The approach not only effectively alleviates the problem that classifier performs poorly on the small and difficult types, but improve overall performance. Experiments on the ACE2005 corpus show that performance is satisfying with the 83.5% macro - average F measure. The main characters and ideas of the approach are: (1) Optimal feature set is built for each type according to local feature selection, which fully ensures the performance of each type. (2) Positive and negative features are explicitly discriminated and combined by using one - sided metrics, which makes use of both features' advantages. (3) Wrapper methods are used to search new features and evaluate the various feature subsets to obtain the optimal feature subset.

Optimal k-Nearest Neighborhood Classifier Using Genetic Algorithm (유전알고리즘을 이용한 최적 k-최근접이웃 분류기)

  • Park, Chong-Sun;Huh, Kyun
    • Communications for Statistical Applications and Methods
    • /
    • v.17 no.1
    • /
    • pp.17-27
    • /
    • 2010
  • Feature selection and feature weighting are useful techniques for improving the classification accuracy of k-Nearest Neighbor (k-NN) classifier. The main propose of feature selection and feature weighting is to reduce the number of features, by eliminating irrelevant and redundant features, while simultaneously maintaining or enhancing classification accuracy. In this paper, a novel hybrid approach is proposed for simultaneous feature selection, feature weighting and choice of k in k-NN classifier based on Genetic Algorithm. The results have indicated that the proposed algorithm is quite comparable with and superior to existing classifiers with or without feature selection and feature weighting capability.

System Trading using Case-based Reasoning based on Absolute Similarity Threshold and Genetic Algorithm (절대 유사 임계값 기반 사례기반추론과 유전자 알고리즘을 활용한 시스템 트레이딩)

  • Han, Hyun-Woong;Ahn, Hyun-Chul
    • The Journal of Information Systems
    • /
    • v.26 no.3
    • /
    • pp.63-90
    • /
    • 2017
  • Purpose This study proposes a novel system trading model using case-based reasoning (CBR) based on absolute similarity threshold. The proposed model is designed to optimize the absolute similarity threshold, feature selection, and instance selection of CBR by using genetic algorithm (GA). With these mechanisms, it enables us to yield higher returns from stock market trading. Design/Methodology/Approach The proposed CBR model uses the absolute similarity threshold varying from 0 to 1, which serves as a criterion for selecting appropriate neighbors in the nearest neighbor (NN) algorithm. Since it determines the nearest neighbors on an absolute basis, it fails to select the appropriate neighbors from time to time. In system trading, it is interpreted as the signal of 'hold'. That is, the system trading model proposed in this study makes trading decisions such as 'buy' or 'sell' only if the model produces a clear signal for stock market prediction. Also, in order to improve the prediction accuracy and the rate of return, the proposed model adopts optimal feature selection and instance selection, which are known to be very effective in enhancing the performance of CBR. To validate the usefulness of the proposed model, we applied it to the index trading of KOSPI200 from 2009 to 2016. Findings Experimental results showed that the proposed model with optimal feature or instance selection could yield higher returns compared to the benchmark as well as the various comparison models (including logistic regression, multiple discriminant analysis, artificial neural network, support vector machine, and traditional CBR). In particular, the proposed model with optimal instance selection showed the best rate of return among all the models. This implies that the application of CBR with the absolute similarity threshold as well as the optimal instance selection may be effective in system trading from the perspective of returns.