• Title/Summary/Keyword: Feature Subset

Search Result 130, Processing Time 0.02 seconds

Improving Classification Performance for Data with Numeric and Categorical Attributes Using Feature Wrapping (특징 래핑을 통한 숫자형 특징과 범주형 특징이 혼합된 데이터의 클래스 분류 성능 향상 기법)

  • Lee, Jae-Sung;Kim, Dae-Won
    • Journal of KIISE:Software and Applications
    • /
    • v.36 no.12
    • /
    • pp.1024-1027
    • /
    • 2009
  • In this letter, we evaluate the classification performance of mixed numeric and categorical data for comparing the efficiency of feature filtering and feature wrapping. Because the mixed data is composed of numeric and categorical features, the feature selection method was applied to data set after discretizing the numeric features in the given data set. In this study, we choose the feature subset for improving the classification performance of the data set after preprocessing. The experimental result of comparing the classification performance show that the feature wrapping method is more reliable than feature filtering method in the aspect of classification accuracy.

An ADHD Diagnostic Approach Based on Binary-Coded Genetic Algorithm and Extreme Learning Machine

  • Sachnev, Vasily;Suresh, Sundaram
    • Journal of Computing Science and Engineering
    • /
    • v.10 no.4
    • /
    • pp.111-117
    • /
    • 2016
  • An accurate approach for diagnosis of attention deficit hyperactivity disorder (ADHD) is presented in this paper. The presented technique efficiently classifies three subtypes of ADHD (ADHD-C, ADHD-H, ADHD-I) and typically developing control (TDC) by using only structural magnetic resonance imaging (MRI). The research examines structural MRI of the hippocampus from the ADHD-200 database. Each available MRI has been processed by a region-of-interest (ROI) to build a set of features for further analysis. The presented ADHD diagnostic approach unifies feature selection and classification techniques. The feature selection technique based on the proposed binary-coded genetic algorithm searches for an optimal subset of features extracted from the hippocampus. The classification technique uses a chosen optimal subset of features for accurate classification of three subtypes of ADHD and TDC. In this study, the famous Extreme Learning Machine is used as a classification technique. Experimental results clearly indicate that the presented BCGA-ELM (binary-coded genetic algorithm coupled with Extreme Learning Machine) efficiently classifies TDC and three subtypes of ADHD and outperforms existing techniques.

Support vector machines with optimal instance selection: An application to bankruptcy prediction

  • Ahn Hyun-Chul;Kim Kyoung-Jae;Han In-Goo
    • Proceedings of the Korea Inteligent Information System Society Conference
    • /
    • 2006.06a
    • /
    • pp.167-175
    • /
    • 2006
  • Building accurate corporate bankruptcy prediction models has been one of the most important research issues in finance. Recently, support vector machines (SVMs) are popularly applied to bankruptcy prediction because of its many strong points. However, in order to use SVM, a modeler should determine several factors by heuristics, which hinders from obtaining accurate prediction results by using SVM. As a result, some researchers have tried to optimize these factors, especially the feature subset and kernel parameters of SVM But, there have been no studies that have attempted to determine appropriate instance subset of SVM, although it may improve the performance by eliminating distorted cases. Thus in the study, we propose the simultaneous optimization of the instance selection as well as the parameters of a kernel function of SVM by using genetic algorithms (GAs). Experimental results show that our model outperforms not only conventional SVM, but also prior approaches for optimizing SVM.

  • PDF

A Study on The Feature Selection and Design of a Binary Decision Tree for Recognition of The Defect Patterns of Cold Mill Strip (냉연 표면 흠 분류를 위한 특징선정 및 이진 트리 분류기의 설계에 관한 연구)

  • Lee, Byung-Jin;Lyou, Kyoung;Park, Gwi-Tae;Kim, Kyoung-Min
    • Proceedings of the KIEE Conference
    • /
    • 1998.07g
    • /
    • pp.2330-2332
    • /
    • 1998
  • This paper suggests a method to recognize the various defect patterns of cold mill strip using binary decision tree automatically constructed by genetic algorithm. The genetic algorithm and K-means algorithm were used to select a subset of the suitable features at each node in binary decision tree. The feature subset with maximum fitness is chosen and the patterns are classified into two classes by a linear decision boundary. This process was repeated at each node until all the patterns are classified into individual classes. The final recognizer is accomplished by neural network learning of a set of standard patterns at each node. Binary decision tree classifier was applied to the recognition of the defect patterns of cold mill strip and the experimental results were given to demonstrate the usefulness of the proposed scheme.

  • PDF

Prediction model of hypercholesterolemia using body fat mass based on machine learning (머신러닝 기반 체지방 측정정보를 이용한 고콜레스테롤혈증 예측모델)

  • Lee, Bum Ju
    • The Journal of the Convergence on Culture Technology
    • /
    • v.5 no.4
    • /
    • pp.413-420
    • /
    • 2019
  • The purpose of the present study is to develop a model for predicting hypercholesterolemia using an integrated set of body fat mass variables based on machine learning techniques, beyond the study of the association between body fat mass and hypercholesterolemia. For this study, a total of six models were created using two variable subset selection methods and machine learning algorithms based on the Korea National Health and Nutrition Examination Survey (KNHANES) data. Among the various body fat mass variables, we found that trunk fat mass was the best variable for predicting hypercholesterolemia. Furthermore, we obtained the area under the receiver operating characteristic curve value of 0.739 and the Matthews correlation coefficient value of 0.36 in the model using the correlation-based feature subset selection and naive Bayes algorithm. Our findings are expected to be used as important information in the field of disease prediction in large-scale screening and public health research.

Feature Selection via Embedded Learning Based on Tangent Space Alignment for Microarray Data

  • Ye, Xiucai;Sakurai, Tetsuya
    • Journal of Computing Science and Engineering
    • /
    • v.11 no.4
    • /
    • pp.121-129
    • /
    • 2017
  • Feature selection has been widely established as an efficient technique for microarray data analysis. Feature selection aims to search for the most important feature/gene subset of a given dataset according to its relevance to the current target. Unsupervised feature selection is considered to be challenging due to the lack of label information. In this paper, we propose a novel method for unsupervised feature selection, which incorporates embedded learning and $l_{2,1}-norm$ sparse regression into a framework to select genes in microarray data analysis. Local tangent space alignment is applied during embedded learning to preserve the local data structure. The $l_{2,1}-norm$ sparse regression acts as a constraint to aid in learning the gene weights correlatively, by which the proposed method optimizes for selecting the informative genes which better capture the interesting natural classes of samples. We provide an effective algorithm to solve the optimization problem in our method. Finally, to validate the efficacy of the proposed method, we evaluate the proposed method on real microarray gene expression datasets. The experimental results demonstrate that the proposed method obtains quite promising performance.

Feature Selection by Genetic Algorithm and Information Theory (유전자 알고리즘과 정보이론을 이용한 속성선택)

  • Cho, Jae-Hoon;Lee, Dae-Jong;Song, Chang-Kyu;Kim, Yong-Sam;Chun, Myung-Geun
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.18 no.1
    • /
    • pp.94-99
    • /
    • 2008
  • In the pattern classification problem, feature selection is an important technique to improve performance of the classifiers. Particularly, in the case of classifying with a large number of features or variables, the accuracy of the classifier can be improved by using the relevant feature subset to remove the irrelevant, redundant, or noisy data. In this paper we propose a feature selection method using genetic algorithm and information theory. Experimental results show that this method can achieve better performance for pattern recognition problems than conventional ones.

A Study on the Design of Binary Decision Tree using FCM algorithm (FCM 알고리즘을 이용한 이진 결정 트리의 구성에 관한 연구)

  • 정순원;박중조;김경민;박귀태
    • Journal of the Korean Institute of Telematics and Electronics B
    • /
    • v.32B no.11
    • /
    • pp.1536-1544
    • /
    • 1995
  • We propose a design scheme of a binary decision tree and apply it to the tire tread pattern recognition problem. In this scheme, a binary decision tree is constructed by using fuzzy C-means( FCM ) algorithm. All the available features are used while clustering. At each node, the best feature or feature subset among these available features is selected based on proposed similarity measure. The decision tree can be used for the classification of unknown patterns. The proposed design scheme is applied to the tire tread pattern recognition problem. The design procedure including feature extraction is described. Experimental results are given to show the usefulness of this scheme.

  • PDF

A Dynamically Reconfiguring Backpropagation Neural Network and Its Application to the Inverse Kinematic Solution of Robot Manipulators (동적 변화구조의 역전달 신경회로와 로보트의 역 기구학 해구현에의 응용)

  • 오세영;송재명
    • The Transactions of the Korean Institute of Electrical Engineers
    • /
    • v.39 no.9
    • /
    • pp.985-996
    • /
    • 1990
  • An inverse kinematic solution of a robot manipulator using multilayer perceptrons is proposed. Neural networks allow the solution of some complex nonlinear equations such as the inverse kinematics of a robot manipulator without the need for its model. However, the back-propagation (BP) learning rule for multilayer perceptrons has the major limitation of being too slow in learning to be practical. In this paper, a new algorithm named Dynamically Reconfiguring BP is proposed to improve its learning speed. It uses a modified version of Kohonen's Self-Organizing Feature Map (SOFM) to partition the input space and for each input point, select a subset of the hidden processing elements or neurons. A subset of the original network results from these selected neuron which learns the desired mapping for this small input region. It is this selective property that accelerates convergence as well as enhances resolution. This network was used to learn the parity function and further, to solve the inverse kinematic problem of a robot manipulator. The results demonstrate faster learning than the BP network.

Stress Detection of Railway Point Machine Using Sound Analysis (소리 정보를 이용한 철도 선로전환기의 스트레스 탐지)

  • Choi, Yongju;Lee, Jonguk;Park, Daihee;Lee, Jonghyun;Chung, Yongwha;Kim, Hee-Young;Yoon, Sukhan
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.5 no.9
    • /
    • pp.433-440
    • /
    • 2016
  • Railway point machines act as actuators that provide different routes to trains by driving switchblades from the current position to the opposite one. Since point failure can significantly affect railway operations with potentially disastrous consequences, early stress detection of point machine is critical for monitoring and managing the condition of rail infrastructure. In this paper, we propose a stress detection method for point machine in railway condition monitoring systems using sound data. The system enables extracting sound feature vector subset from audio data with reduced feature dimensions using feature subset selection, and employs support vector machines (SVMs) for early detection of stress anomalies. Experimental results show that the system enables cost-effective detection of stress using a low-cost microphone, with accuracy exceeding 98%.