• Title/Summary/Keyword: Feature Subset

Search Result 130, Processing Time 0.022 seconds

Variable Selection Based on Mutual Information

  • Huh, Moon-Y.;Choi, Byong-Su
    • Communications for Statistical Applications and Methods
    • /
    • v.16 no.1
    • /
    • pp.143-155
    • /
    • 2009
  • Best subset selection procedure based on mutual information (MI) between a set of explanatory variables and a dependent class variable is suggested. Derivation of multivariate MI is based on normal mixtures. Several types of normal mixtures are proposed. Also a best subset selection algorithm is proposed. Four real data sets are employed to demonstrate the efficiency of the proposals.

A Hybrid Feature Selection Method using Univariate Analysis and LVF Algorithm (단변량 분석과 LVF 알고리즘을 결합한 하이브리드 속성선정 방법)

  • Lee, Jae-Sik;Jeong, Mi-Kyoung
    • Journal of Intelligence and Information Systems
    • /
    • v.14 no.4
    • /
    • pp.179-200
    • /
    • 2008
  • We develop a feature selection method that can improve both the efficiency and the effectiveness of classification technique. In this research, we employ case-based reasoning as a classification technique. Basically, this research integrates the two existing feature selection methods, i.e., the univariate analysis and the LVF algorithm. First, we sift some predictive features from the whole set of features using the univariate analysis. Then, we generate all possible subsets of features from these predictive features and measure the inconsistency rate of each subset using the LVF algorithm. Finally, the subset having the lowest inconsistency rate is selected as the best subset of features. We measure the performances of our feature selection method using the data obtained from UCI Machine Learning Repository, and compare them with those of existing methods. The number of selected features and the accuracy of our feature selection method are so satisfactory that the improvements both in efficiency and effectiveness are achieved.

  • PDF

Feature Selection for Classification of Mass Spectrometric Proteomic Data Using Random Forest (단백체 스펙트럼 데이터의 분류를 위한 랜덤 포리스트 기반 특성 선택 알고리즘)

  • Ohn, Syng-Yup;Chi, Seung-Do;Han, Mi-Young
    • Journal of the Korea Society for Simulation
    • /
    • v.22 no.4
    • /
    • pp.139-147
    • /
    • 2013
  • This paper proposes a novel method for feature selection for mass spectrometric proteomic data based on Random Forest. The method includes an effective preprocessing step to filter a large amount of redundant features with high correlation and applies a tournament strategy to get an optimal feature subset. Experiments on three public datasets, Ovarian 4-3-02, Ovarian 7-8-02 and Prostate shows that the new method achieves high performance comparing with widely used methods and balanced rate of specificity and sensitivity.

A design of binary decision tree using genetic algorithms and its applications (유전 알고리즘을 이용한 이진 결정 트리의 설계와 응용)

  • 정순원;박귀태
    • Journal of the Korean Institute of Telematics and Electronics B
    • /
    • v.33B no.6
    • /
    • pp.102-110
    • /
    • 1996
  • A new design scheme of a binary decision tree is proposed. In this scheme a binary decision tree is constructed by using genetic algorithm and FCM algorithm. At each node optimal or near-optimal feature subset is selected which optimizes fitness function in genetic algorithm. The fitness function is inversely proportional to classification error, balance between cluster, number of feature used. The binary strings in genetic algorithm determine the feature subset and classification results - error, balance - form fuzzy partition matrix affect reproduction of next genratin. The proposed design scheme is applied to the tire tread patterns and handwriteen alphabetic characters. Experimental results show the usefulness of the proposed scheme.

  • PDF

Feature Subset for Improving Accuracy of Keystroke Dynamics on Mobile Environment

  • Lee, Sung-Hoon;Roh, Jong-hyuk;Kim, SooHyung;Jin, Seung-Hun
    • Journal of Information Processing Systems
    • /
    • v.14 no.2
    • /
    • pp.523-538
    • /
    • 2018
  • Keystroke dynamics user authentication is a behavior-based authentication method which analyzes patterns in how a user enters passwords and PINs to authenticate the user. Even if a password or PIN is revealed to another user, it analyzes the input pattern to authenticate the user; hence, it can compensate for the drawbacks of knowledge-based (what you know) authentication. However, users' input patterns are not always fixed, and each user's touch method is different. Therefore, there are limitations to extracting the same features for all users to create a user's pattern and perform authentication. In this study, we perform experiments to examine the changes in user authentication performance when using feature vectors customized for each user versus using all features. User customized features show a mean improvement of over 6% in error equal rate, as compared to when all features are used.

Scalable Collaborative Filtering Technique based on Adaptive Clustering (적응형 군집화 기반 확장 용이한 협업 필터링 기법)

  • Lee, O-Joun;Hong, Min-Sung;Lee, Won-Jin;Lee, Jae-Dong
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.2
    • /
    • pp.73-92
    • /
    • 2014
  • An Adaptive Clustering-based Collaborative Filtering Technique was proposed to solve the fundamental problems of collaborative filtering, such as cold-start problems, scalability problems and data sparsity problems. Previous collaborative filtering techniques were carried out according to the recommendations based on the predicted preference of the user to a particular item using a similar item subset and a similar user subset composed based on the preference of users to items. For this reason, if the density of the user preference matrix is low, the reliability of the recommendation system will decrease rapidly. Therefore, the difficulty of creating a similar item subset and similar user subset will be increased. In addition, as the scale of service increases, the time needed to create a similar item subset and similar user subset increases geometrically, and the response time of the recommendation system is then increased. To solve these problems, this paper suggests a collaborative filtering technique that adapts a condition actively to the model and adopts the concepts of a context-based filtering technique. This technique consists of four major methodologies. First, items are made, the users are clustered according their feature vectors, and an inter-cluster preference between each item cluster and user cluster is then assumed. According to this method, the run-time for creating a similar item subset or user subset can be economized, the reliability of a recommendation system can be made higher than that using only the user preference information for creating a similar item subset or similar user subset, and the cold start problem can be partially solved. Second, recommendations are made using the prior composed item and user clusters and inter-cluster preference between each item cluster and user cluster. In this phase, a list of items is made for users by examining the item clusters in the order of the size of the inter-cluster preference of the user cluster, in which the user belongs, and selecting and ranking the items according to the predicted or recorded user preference information. Using this method, the creation of a recommendation model phase bears the highest load of the recommendation system, and it minimizes the load of the recommendation system in run-time. Therefore, the scalability problem and large scale recommendation system can be performed with collaborative filtering, which is highly reliable. Third, the missing user preference information is predicted using the item and user clusters. Using this method, the problem caused by the low density of the user preference matrix can be mitigated. Existing studies on this used an item-based prediction or user-based prediction. In this paper, Hao Ji's idea, which uses both an item-based prediction and user-based prediction, was improved. The reliability of the recommendation service can be improved by combining the predictive values of both techniques by applying the condition of the recommendation model. By predicting the user preference based on the item or user clusters, the time required to predict the user preference can be reduced, and missing user preference in run-time can be predicted. Fourth, the item and user feature vector can be made to learn the following input of the user feedback. This phase applied normalized user feedback to the item and user feature vector. This method can mitigate the problems caused by the use of the concepts of context-based filtering, such as the item and user feature vector based on the user profile and item properties. The problems with using the item and user feature vector are due to the limitation of quantifying the qualitative features of the items and users. Therefore, the elements of the user and item feature vectors are made to match one to one, and if user feedback to a particular item is obtained, it will be applied to the feature vector using the opposite one. Verification of this method was accomplished by comparing the performance with existing hybrid filtering techniques. Two methods were used for verification: MAE(Mean Absolute Error) and response time. Using MAE, this technique was confirmed to improve the reliability of the recommendation system. Using the response time, this technique was found to be suitable for a large scaled recommendation system. This paper suggested an Adaptive Clustering-based Collaborative Filtering Technique with high reliability and low time complexity, but it had some limitations. This technique focused on reducing the time complexity. Hence, an improvement in reliability was not expected. The next topic will be to improve this technique by rule-based filtering.

Prediction model of osteoporosis using nutritional components based on association (연관성 규칙 기반 영양소를 이용한 골다공증 예측 모델)

  • Yoo, JungHun;Lee, Bum Ju
    • The Journal of the Convergence on Culture Technology
    • /
    • v.6 no.3
    • /
    • pp.457-462
    • /
    • 2020
  • Osteoporosis is a disease that occurs mainly in the elderly and increases the risk of fractures due to structural deterioration of bone mass and tissues. The purpose of this study are to assess the relationship between nutritional components and osteoporosis and to evaluate models for predicting osteoporosis based on nutrient components. In experimental method, association was performed using binary logistic regression, and predictive models were generated using the naive Bayes algorithm and variable subset selection methods. The analysis results for single variables indicated that food intake and vitamin B2 showed the highest value of the area under the receiver operating characteristic curve (AUC) for predicting osteoporosis in men. In women, monounsaturated fatty acids showed the highest AUC value. In prediction model of female osteoporosis, the models generated by the correlation based feature subset and wrapper based variable subset methods showed an AUC value of 0.662. In men, the model by the full variable obtained an AUC of 0.626, and in other male models, the predictive performance was very low in sensitivity and 1-specificity. The results of these studies are expected to be used as the basic information for the treatment and prevention of osteoporosis.

Simultaneous optimization method of feature transformation and weighting for artificial neural networks using genetic algorithm : Application to Korean stock market

  • Kim, Kyoung-jae;Ingoo Han
    • Proceedings of the Korea Inteligent Information System Society Conference
    • /
    • 1999.10a
    • /
    • pp.323-335
    • /
    • 1999
  • In this paper, we propose a new hybrid model of artificial neural networks(ANNs) and genetic algorithm (GA) to optimal feature transformation and feature weighting. Previous research proposed several variants of hybrid ANNs and GA models including feature weighting, feature subset selection and network structure optimization. Among the vast majority of these studies, however, ANNs did not learn the patterns of data well, because they employed GA for simple use. In this study, we incorporate GA in a simultaneous manner to improve the learning and generalization ability of ANNs. In this study, GA plays role to optimize feature weighting and feature transformation simultaneously. Globally optimized feature weighting overcome the well-known limitations of gradient descent algorithm and globally optimized feature transformation also reduce the dimensionality of the feature space and eliminate irrelevant factors in modeling ANNs. By this procedure, we can improve the performance and enhance the generalisability of ANNs.

  • PDF

Performance Evaluation of a Feature-Importance-based Feature Selection Method for Time Series Prediction

  • Hyun, Ahn
    • Journal of information and communication convergence engineering
    • /
    • v.21 no.1
    • /
    • pp.82-89
    • /
    • 2023
  • Various machine-learning models may yield high predictive power for massive time series for time series prediction. However, these models are prone to instability in terms of computational cost because of the high dimensionality of the feature space and nonoptimized hyperparameter settings. Considering the potential risk that model training with a high-dimensional feature set can be time-consuming, we evaluate a feature-importance-based feature selection method to derive a tradeoff between predictive power and computational cost for time series prediction. We used two machine learning techniques for performance evaluation to generate prediction models from a retail sales dataset. First, we ranked the features using impurity- and Local Interpretable Model-agnostic Explanations (LIME) -based feature importance measures in the prediction models. Then, the recursive feature elimination method was applied to eliminate unimportant features sequentially. Consequently, we obtained a subset of features that could lead to reduced model training time while preserving acceptable model performance.

A design of binary decision tree using genetic algorithms and its application to the alphabetic charcter (유전 알고리즘을 이용한 이진 결정 트리의 설계와 영문자 인식에의 응용)

  • 정순원;김경민;박귀태
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 1995.10b
    • /
    • pp.218-223
    • /
    • 1995
  • A new design scheme of a binary decision tree is proposed. In this scheme a binary decision tree is constructed by using genetic algorithm and FCM algorithm. At each node optimal or near-optimal feature or feature subset among all the available features is selected based on fitness function in genetic algorithm which is inversely proportional to classification error, balance between cluster, number of feature used. The proposed design scheme is applied to the handwtitten alphabetic characters. Experimental results show the usefulness of the proposed scheme.

  • PDF