• Title/Summary/Keyword: Accuracy of Selection

Search Result 1,163, Processing Time 0.033 seconds

Effective Multi-label Feature Selection based on Large Offspring Set created by Enhanced Evolutionary Search Process

  • Lim, Hyunki;Seo, Wangduk;Lee, Jaesung
    • Journal of the Korea Society of Computer and Information
    • /
    • v.23 no.9
    • /
    • pp.7-13
    • /
    • 2018
  • Recent advancement in data gathering technique improves the capability of information collecting, thus allowing the learning process between gathered data patterns and application sub-tasks. A pattern can be associated with multiple labels, demanding multi-label learning capability, resulting in significant attention to multi-label feature selection since it can improve multi-label learning accuracy. However, existing evolutionary multi-label feature selection methods suffer from ineffective search process. In this study, we propose a evolutionary search process for the task of multi-label feature selection problem. The proposed method creates large set of offspring or new feature subsets and then retains the most promising feature subset. Experimental results demonstrate that the proposed method can identify feature subsets giving good multi-label classification accuracy much faster than conventional methods.

Optimal k-Nearest Neighborhood Classifier Using Genetic Algorithm (유전알고리즘을 이용한 최적 k-최근접이웃 분류기)

  • Park, Chong-Sun;Huh, Kyun
    • Communications for Statistical Applications and Methods
    • /
    • v.17 no.1
    • /
    • pp.17-27
    • /
    • 2010
  • Feature selection and feature weighting are useful techniques for improving the classification accuracy of k-Nearest Neighbor (k-NN) classifier. The main propose of feature selection and feature weighting is to reduce the number of features, by eliminating irrelevant and redundant features, while simultaneously maintaining or enhancing classification accuracy. In this paper, a novel hybrid approach is proposed for simultaneous feature selection, feature weighting and choice of k in k-NN classifier based on Genetic Algorithm. The results have indicated that the proposed algorithm is quite comparable with and superior to existing classifiers with or without feature selection and feature weighting capability.

On the Performance of Cuckoo Search and Bat Algorithms Based Instance Selection Techniques for SVM Speed Optimization with Application to e-Fraud Detection

  • AKINYELU, Andronicus Ayobami;ADEWUMI, Aderemi Oluyinka
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.12 no.3
    • /
    • pp.1348-1375
    • /
    • 2018
  • Support Vector Machine (SVM) is a well-known machine learning classification algorithm, which has been widely applied to many data mining problems, with good accuracy. However, SVM classification speed decreases with increase in dataset size. Some applications, like video surveillance and intrusion detection, requires a classifier to be trained very quickly, and on large datasets. Hence, this paper introduces two filter-based instance selection techniques for optimizing SVM training speed. Fast classification is often achieved at the expense of classification accuracy, and some applications, such as phishing and spam email classifiers, are very sensitive to slight drop in classification accuracy. Hence, this paper also introduces two wrapper-based instance selection techniques for improving SVM predictive accuracy and training speed. The wrapper and filter based techniques are inspired by Cuckoo Search Algorithm and Bat Algorithm. The proposed techniques are validated on three popular e-fraud types: credit card fraud, spam email and phishing email. In addition, the proposed techniques are validated on 20 other datasets provided by UCI data repository. Moreover, statistical analysis is performed and experimental results reveals that the filter-based and wrapper-based techniques significantly improved SVM classification speed. Also, results reveal that the wrapper-based techniques improved SVM predictive accuracy in most cases.

Comparison of Feature Selection Methods in Support Vector Machines (지지벡터기계의 변수 선택방법 비교)

  • Kim, Kwangsu;Park, Changyi
    • The Korean Journal of Applied Statistics
    • /
    • v.26 no.1
    • /
    • pp.131-139
    • /
    • 2013
  • Support vector machines(SVM) may perform poorly in the presence of noise variables; in addition, it is difficult to identify the importance of each variable in the resulting classifier. A feature selection can improve the interpretability and the accuracy of SVM. Most existing studies concern feature selection in the linear SVM through penalty functions yielding sparse solutions. Note that one usually adopts nonlinear kernels for the accuracy of classification in practice. Hence feature selection is still desirable for nonlinear SVMs. In this paper, we compare the performances of nonlinear feature selection methods such as component selection and smoothing operator(COSSO) and kernel iterative feature extraction(KNIFE) on simulated and real data sets.

Cutting Condition Selection for Geometrical Accuracy Improvement in End Milling (엔드밀 가공에서 형상 정밀도 향상을 위한 절삭 조건 선정)

  • 류시형;최덕기;주종남
    • Proceedings of the Korean Society of Precision Engineering Conference
    • /
    • 2003.06a
    • /
    • pp.1784-1788
    • /
    • 2003
  • For the improvement of geometrical accuracy in end milling, cutting method and cutting condition selection are investigated in this paper. As machining processes are composed of several steps such as roughing, semi-finishing. and finishing, cutting forces and tool deflection are calculated considering surface shape generated by the previous cutting. The effects of tool teeth numbers, tool geometry, and cutting conditions on the form error are analyzed. Using the from error prediction method from tool deflection, cutting condition for geometrical accuracy improvement is discussed. The characteristics and the difference of generated surface shape in up and down milling are dealt with and over-cut free condition in up milling is presented. The form error reduction method by alternating up and down milling is also suggested. The effectiveness of the presented method is examined from a set of cutting tests under various cutting conditions. This research contributes to cutting process optimization for the geometrical accuracy improvement in die and mold manufacture.

  • PDF

GA-SVM Ensemble 모델에서의 accuracy와 diversity를 고려한 feature subset population 선택

  • Seong, Gi-Seok;Jo, Seong-Jun
    • Proceedings of the Korean Operations and Management Science Society Conference
    • /
    • 2005.05a
    • /
    • pp.614-620
    • /
    • 2005
  • Ensemble에서 feature selection은 각 classifier의 학습할 데이터의 변수를 다르게 하여 diversity를 높이며, 이것은 일반적인 성능향상을 가져온다. Feature selection을 할 때 쓰는 방법 중의 하나가 Genetic Algorithm (GA)이며, GA-SVM은 GA를 기본으로 한 wrapper based feature selection mechanism으로 response model과 keystroke dynamics identity verification model을 만들 때 좋은 성능을 보였다. 하지만 population 안의 후보들간의 diversity를 보장해주지 못한다는 단점 때문에 classifier들의 accuracy와 diversity의 균형을 맞추기 위한 heuristic parameter setting이 존재하며 이를 조정해야만 하였다. 우리는 GA-SVM 알고리즘을 바탕으로, population안 후보들의 fitness를 측정할 때 accuracy와 diversity 둘 다 고려하는 fitness function을 도입하여 추가적인 classifier 선택 작업을 제거하면서 성능을 유지시키는 방안을 연구하였으며 결과적으로 알고리즘의 복잡성을 줄이면서도 모델의 성능을 유지시켰다.

  • PDF

Brand Selection of shirts and Jeans Relating to Consumers' Characteristics: A Comparative Study between Domestic and Foreign Brand (셔츠 및 청바지의 상표선택과 소비자 특성에 관한 연구)

  • 이명희
    • Journal of the Korean Home Economics Association
    • /
    • v.35 no.1
    • /
    • pp.263-276
    • /
    • 1997
  • The objectives of this study were to examine the differences in brand selection motives according to the domestic and foreign brand selection with shirts and jeans and to disclose the relationships between the brand selection and consumers' charcteristics; like their demographic variables sociability and superiority. Samples were 262 college women in Seoul Korea. The data were analyzed using t-test paired t-test χ2-test and discriminant analysis. The results of the study were the followings. 1. Purchasers of foreign brand were influenced by 'quality' 'wearing of others', 'reputation of brand', 'possibility of credit card use' more than those of domestic while purchasers of domestic brand were influenced by price. 2. Purchasers of foreign brand had more tendency to decide which brand to buy in advance than those of domestic. 3. 6 brand selection motives consumers' income and sociability contributed to discriminating the group of domestic and foreign brand purchase with shirts. The accuracy of the predicting the groups by the 8 variables was 75.95% Consumers high in sociability and income belonged to the group of foreign brand purchase. 4,6 brand selection motives consumers' age and superiority contributed to discriminating the group of domestic and foreign brand purchase with jeans. The accuracy of the predicting the groups by the 8 variables was 72.52% Consumers high in sociability and income belonged to the group of foreign brand purchase. 4. 6 brand selection motives consumer's age and superiority contributed to discriminating the group of domestic and foreign brand purchase with jeans. The accuracy of the predicting the groups by the 8 variables was 72.52% Consumers high in superiority and youngers belonged to the group of foreign brand purchase.

  • PDF

A Novel Feature Selection Method for Output Coding based Multiclass SVM (출력 코딩 기반 다중 클래스 서포트 벡터 머신을 위한 특징 선택 기법)

  • Lee, Youngjoo;Lee, Jeongjin
    • Journal of Korea Multimedia Society
    • /
    • v.16 no.7
    • /
    • pp.795-801
    • /
    • 2013
  • Recently, support vector machine has been widely used in various application fields due to its superiority of classification performance comparing with decision tree and neural network. Since support vector machine is basically designed for the binary classification problem, output coding method to analyze the classification result of multiclass binary classifier is used for the application of support vector machine into the multiclass problem. However, previous feature selection method for output coding based support vector machine found the features to improve the overall classification accuracy instead of improving each classification accuracy of each classifier. In this paper, we propose the novel feature selection method to find the features for maximizing the classification accuracy of each binary classifier in output coding based support vector machine. Experimental result showed that proposed method significantly improved the classification accuracy comparing with previous feature selection method.

Impact of Instance Selection on kNN-Based Text Categorization

  • Barigou, Fatiha
    • Journal of Information Processing Systems
    • /
    • v.14 no.2
    • /
    • pp.418-434
    • /
    • 2018
  • With the increasing use of the Internet and electronic documents, automatic text categorization becomes imperative. Several machine learning algorithms have been proposed for text categorization. The k-nearest neighbor algorithm (kNN) is known to be one of the best state of the art classifiers when used for text categorization. However, kNN suffers from limitations such as high computation when classifying new instances. Instance selection techniques have emerged as highly competitive methods to improve kNN through data reduction. However previous works have evaluated those approaches only on structured datasets. In addition, their performance has not been examined over the text categorization domain where the dimensionality and size of the dataset is very high. Motivated by these observations, this paper investigates and analyzes the impact of instance selection on kNN-based text categorization in terms of various aspects such as classification accuracy, classification efficiency, and data reduction.

A Document Classification System Using Modified ECCD and Category Weight for each Document (Modified ECCD 및 문서별 범주 가중치를 이용한 문서 분류 시스템)

  • Han, Chung-Seok;Park, Sang-Yong;Lee, Soo-Won
    • The KIPS Transactions:PartB
    • /
    • v.19B no.4
    • /
    • pp.237-242
    • /
    • 2012
  • Web information service needs a document classification system for efficient management and conveniently searches. Existing document classification systems have a problem of low accuracy in classification, if a few number of feature words is selected in documents or if the number of documents that belong to a specific category is excessively large. To solve this problem, we propose a document classification system using 'Modified ECCD' feature selection method and 'Category Weight for each Document'. Experimental results show that the 'Modified ECCD' feature selection method has higher accuracy in classification than ${\chi}^2$ and the ECCD method. Moreover, combining the 'Category Weight for each Document' feature value and 'Modified ECCD' feature selection method results better accuracy in classification.