• Title/Summary/Keyword: Feature selection optimization

Search Result 91, Processing Time 0.027 seconds

Simultaneous optimization method of feature transformation and weighting for artificial neural networks using genetic algorithm : Application to Korean stock market

  • Kim, Kyoung-jae;Ingoo Han
    • Proceedings of the Korea Inteligent Information System Society Conference
    • /
    • 1999.10a
    • /
    • pp.323-335
    • /
    • 1999
  • In this paper, we propose a new hybrid model of artificial neural networks(ANNs) and genetic algorithm (GA) to optimal feature transformation and feature weighting. Previous research proposed several variants of hybrid ANNs and GA models including feature weighting, feature subset selection and network structure optimization. Among the vast majority of these studies, however, ANNs did not learn the patterns of data well, because they employed GA for simple use. In this study, we incorporate GA in a simultaneous manner to improve the learning and generalization ability of ANNs. In this study, GA plays role to optimize feature weighting and feature transformation simultaneously. Globally optimized feature weighting overcome the well-known limitations of gradient descent algorithm and globally optimized feature transformation also reduce the dimensionality of the feature space and eliminate irrelevant factors in modeling ANNs. By this procedure, we can improve the performance and enhance the generalisability of ANNs.

  • PDF

General Set Covering for Feature Selection in Data Mining

  • Ma, Zhengyu;Ryoo, Hong Seo
    • Management Science and Financial Engineering
    • /
    • v.18 no.2
    • /
    • pp.13-17
    • /
    • 2012
  • Set covering has widely been accepted as a staple tool for feature selection in data mining. We present a generalized version of this classical combinatorial optimization model to make it better suited for the purpose and propose a surrogate relaxation-based procedure for its meta-heuristic solution. Mathematically and also numerically with experiments on 25 set covering instances, we demonstrate the utility of the proposed model and the proposed solution method.

Detection for JPEG steganography based on evolutionary feature selection and classifier ensemble selection

  • Ma, Xiaofeng;Zhang, Yi;Song, Xiangfeng;Fan, Chao
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.11 no.11
    • /
    • pp.5592-5609
    • /
    • 2017
  • JPEG steganography detection is an active research topic in the field of information hiding due to the wide use of JPEG image in social network, image-sharing websites, and Internet communication, etc. In this paper, a new steganalysis method for content-adaptive JPEG steganography is proposed by integrating the evolutionary feature selection and classifier ensemble selection. First, the whole framework of the proposed steganalysis method is presented and then the characteristic of the proposed method is analyzed. Second, the feature selection method based on genetic algorithm is given and the implement process is described in detail. Third, the method of classifier ensemble selection is proposed based on Pareto evolutionary optimization. The experimental results indicate the proposed steganalysis method can achieve a competitive detection performance by compared with the state-of-the-art steganalysis methods when used for the detection of the latest content-adaptive JPEG steganography algorithms.

Feature Selection to Mine Joint Features from High-dimension Space for Android Malware Detection

  • Xu, Yanping;Wu, Chunhua;Zheng, Kangfeng;Niu, Xinxin;Lu, Tianling
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.11 no.9
    • /
    • pp.4658-4679
    • /
    • 2017
  • Android is now the most popular smartphone platform and remains rapid growth. There are huge number of sensitive privacy information stored in Android devices. Kinds of methods have been proposed to detect Android malicious applications and protect the privacy information. In this work, we focus on extracting the fine-grained features to maximize the information of Android malware detection, and selecting the least joint features to minimize the number of features. Firstly, permissions and APIs, not only from Android permissions and SDK APIs but also from the developer-defined permissions and third-party library APIs, are extracted as features from the decompiled source codes. Secondly, feature selection methods, including information gain (IG), regularization and particle swarm optimization (PSO) algorithms, are used to analyze and utilize the correlation between the features to eliminate the redundant data, reduce the feature dimension and mine the useful joint features. Furthermore, regularization and PSO are integrated to create a new joint feature mining method. Experiment results show that the joint feature mining method can utilize the advantages of regularization and PSO, and ensure good performance and efficiency for Android malware detection.

Feature Selection via Embedded Learning Based on Tangent Space Alignment for Microarray Data

  • Ye, Xiucai;Sakurai, Tetsuya
    • Journal of Computing Science and Engineering
    • /
    • v.11 no.4
    • /
    • pp.121-129
    • /
    • 2017
  • Feature selection has been widely established as an efficient technique for microarray data analysis. Feature selection aims to search for the most important feature/gene subset of a given dataset according to its relevance to the current target. Unsupervised feature selection is considered to be challenging due to the lack of label information. In this paper, we propose a novel method for unsupervised feature selection, which incorporates embedded learning and $l_{2,1}-norm$ sparse regression into a framework to select genes in microarray data analysis. Local tangent space alignment is applied during embedded learning to preserve the local data structure. The $l_{2,1}-norm$ sparse regression acts as a constraint to aid in learning the gene weights correlatively, by which the proposed method optimizes for selecting the informative genes which better capture the interesting natural classes of samples. We provide an effective algorithm to solve the optimization problem in our method. Finally, to validate the efficacy of the proposed method, we evaluate the proposed method on real microarray gene expression datasets. The experimental results demonstrate that the proposed method obtains quite promising performance.

Centroid and Nearest Neighbor based Class Imbalance Reduction with Relevant Feature Selection using Ant Colony Optimization for Software Defect Prediction

  • B., Kiran Kumar;Gyani, Jayadev;Y., Bhavani;P., Ganesh Reddy;T, Nagasai Anjani Kumar
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.10
    • /
    • pp.1-10
    • /
    • 2022
  • Nowadays software defect prediction (SDP) is most active research going on in software engineering. Early detection of defects lowers the cost of the software and also improves reliability. Machine learning techniques are widely used to create SDP models based on programming measures. The majority of defect prediction models in the literature have problems with class imbalance and high dimensionality. In this paper, we proposed Centroid and Nearest Neighbor based Class Imbalance Reduction (CNNCIR) technique that considers dataset distribution characteristics to generate symmetry between defective and non-defective records in imbalanced datasets. The proposed approach is compared with SMOTE (Synthetic Minority Oversampling Technique). The high-dimensionality problem is addressed using Ant Colony Optimization (ACO) technique by choosing relevant features. We used nine different classifiers to analyze six open-source software defect datasets from the PROMISE repository and seven performance measures are used to evaluate them. The results of the proposed CNNCIR method with ACO based feature selection reveals that it outperforms SMOTE in the majority of cases.

Hybrid Case-based Reasoning and Genetic Algorithms Approach for Customer Classification

  • Kim Kyoung-jae;Ahn Hyunchul
    • Journal of information and communication convergence engineering
    • /
    • v.3 no.4
    • /
    • pp.209-212
    • /
    • 2005
  • This study proposes hybrid case-based reasoning and genetic algorithms model for customer classification. In this study, vertical and horizontal dimensions of the research data are reduced through integrated feature and instance selection process using genetic algorithms. We applied the proposed model to customer classification model which utilizes customers' demographic characteristics as inputs to predict their buying behavior for the specific product. Experimental results show that the proposed model may improve the classification accuracy and outperform various optimization models of typical CBR system.

Performance Improvement of Feature Selection Methods based on Bio-Inspired Algorithms (생태계 모방 알고리즘 기반 특징 선택 방법의 성능 개선 방안)

  • Yun, Chul-Min;Yang, Ji-Hoon
    • The KIPS Transactions:PartB
    • /
    • v.15B no.4
    • /
    • pp.331-340
    • /
    • 2008
  • Feature Selection is one of methods to improve the classification accuracy of data in the field of machine learning. Many feature selection algorithms have been proposed and discussed for years. However, the problem of finding the optimal feature subset from full data still remains to be a difficult problem. Bio-inspired algorithms are well-known evolutionary algorithms based on the principles of behavior of organisms, and very useful methods to find the optimal solution in optimization problems. Bio-inspired algorithms are also used in the field of feature selection problems. So in this paper we proposed new improved bio-inspired algorithms for feature selection. We used well-known bio-inspired algorithms, Genetic Algorithm (GA) and Particle Swarm Optimization (PSO), to find the optimal subset of features that shows the best performance in classification accuracy. In addition, we modified the bio-inspired algorithms considering the prior importance (prior relevance) of each feature. We chose the mRMR method, which can measure the goodness of single feature, to set the prior importance of each feature. We modified the evolution operators of GA and PSO by using the prior importance of each feature. We verified the performance of the proposed methods by experiment with datasets. Feature selection methods using GA and PSO produced better performances in terms of the classification accuracy. The modified method with the prior importance demonstrated improved performances in terms of the evolution speed and the classification accuracy.

Exploring the Feature Selection Method for Effective Opinion Mining: Emphasis on Particle Swarm Optimization Algorithms

  • Eo, Kyun Sun;Lee, Kun Chang
    • Journal of the Korea Society of Computer and Information
    • /
    • v.25 no.11
    • /
    • pp.41-50
    • /
    • 2020
  • Sentimental analysis begins with the search for words that determine the sentimentality inherent in data. Managers can understand market sentimentality by analyzing a number of relevant sentiment words which consumers usually tend to use. In this study, we propose exploring performance of feature selection methods embedded with Particle Swarm Optimization Multi Objectives Evolutionary Algorithms. The performance of the feature selection methods was benchmarked with machine learning classifiers such as Decision Tree, Naive Bayesian Network, Support Vector Machine, Random Forest, Bagging, Random Subspace, and Rotation Forest. Our empirical results of opinion mining revealed that the number of features was significantly reduced and the performance was not hurt. In specific, the Support Vector Machine showed the highest accuracy. Random subspace produced the best AUC results.

Financial Application of Integrated Optimization and Machine Learning Technique (최적화와 기계학습 결합기법의 재무응용)

  • Kim, Kyoung-jae;Park, Hoyeon;Cha, Injoon
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2019.01a
    • /
    • pp.429-430
    • /
    • 2019
  • 본 논문에서는 최적화 기법에 기반한 지능형 시스템의 재무응용사례를 다룬다. 본 연구에서 제안하는 모형은 대표적인 최적화 기법 중 하나인 시뮬레이티드 어니일링인데 이는 유전자 알고리듬과 유사한 최적화 성능을 가지고 있는 것으로 알려져 있으나 재무분야에서 응용된 사례가 거의 없다. 본 연구에서 제안하는 지능형 시스템은 시뮬레이티드 어니일링과 기계학습 기법을 결합한 것이다. 일반적으로 최적화와 기계학습 기법을 결합하는 방법은 특징선택(feature selection), 특징 가중치 최적화(feature weighting), 사례선택(instance selection), 모수 최적화(parameter optimization) 등의 방법이 있는데 선행연구에서 가장 많이 사용된 것은 특징선택에 두 기법을 결합하는 방식이다. 본 연구에서도 기계학습 기법을 재무 문제에 활용함에 있어서 최적의 특징선택을 위해 시뮬레이티드 어니일링을 결합하는 방식을 사용한다. 본 연구에서 제안된 기법의 유용성을 확인하기 위하여 실제 재무분야의 데이터를 활용하여 예측 정확도를 확인하였으며 그 결과를 통하여 제안하는 모형의 유용성을 확인할 수 있었다.

  • PDF