• Title/Summary/Keyword: selection of features

Search Result 907, Processing Time 0.026 seconds

Feature Selection for Multi-Class Support Vector Machines Using an Impurity Measure of Classification Trees: An Application to the Credit Rating of S&P 500 Companies

  • Hong, Tae-Ho;Park, Ji-Young
    • Asia pacific journal of information systems
    • /
    • v.21 no.2
    • /
    • pp.43-58
    • /
    • 2011
  • Support vector machines (SVMs), a machine learning technique, has been applied to not only binary classification problems such as bankruptcy prediction but also multi-class problems such as corporate credit ratings. However, in general, the performance of SVMs can be easily worse than the best alternative model to SVMs according to the selection of predictors, even though SVMs has the distinguishing feature of successfully classifying and predicting in a lot of dichotomous or multi-class problems. For overcoming the weakness of SVMs, this study has proposed an approach for selecting features for multi-class SVMs that utilize the impurity measures of classification trees. For the selection of the input features, we employed the C4.5 and CART algorithms, including the stepwise method of discriminant analysis, which is a well-known method for selecting features. We have built a multi-class SVMs model for credit rating using the above method and presented experimental results with data regarding S&P 500 companies.

Unsupervised Feature Selection Method Based on Principal Component Loading Vectors (주성분 분석 로딩 벡터 기반 비지도 변수 선택 기법)

  • Park, Young Joon;Kim, Seoung Bum
    • Journal of Korean Institute of Industrial Engineers
    • /
    • v.40 no.3
    • /
    • pp.275-282
    • /
    • 2014
  • One of the most widely used methods for dimensionality reduction is principal component analysis (PCA). However, the reduced dimensions from PCA do not provide a clear interpretation with respect to the original features because they are linear combinations of a large number of original features. This interpretation problem can be overcome by feature selection approaches that identifying the best subset of given features. In this study, we propose an unsupervised feature selection method based on the geometrical information of PCA loading vectors. Experimental results from a simulation study demonstrated the efficiency and usefulness of the proposed method.

Semantic-based Genetic Algorithm for Feature Selection (의미 기반 유전 알고리즘을 사용한 특징 선택)

  • Kim, Jung-Ho;In, Joo-Ho;Chae, Soo-Hoan
    • Journal of Internet Computing and Services
    • /
    • v.13 no.4
    • /
    • pp.1-10
    • /
    • 2012
  • In this paper, an optimal feature selection method considering sematic of features, which is preprocess of document classification is proposed. The feature selection is very important part on classification, which is composed of removing redundant features and selecting essential features. LSA (Latent Semantic Analysis) for considering meaning of the features is adopted. However, a supervised LSA which is suitable method for classification problems is used because the basic LSA is not specialized for feature selection. We also apply GA (Genetic Algorithm) to the features, which are obtained from supervised LSA to select better feature subset. Finally, we project documents onto new selected feature subset and classify them using specific classifier, SVM (Support Vector Machine). It is expected to get high performance and efficiency of classification by selecting optimal feature subset using the proposed hybrid method of supervised LSA and GA. Its efficiency is proved through experiments using internet news classification with low features.

A Fast and Efficient Haar-Like Feature Selection Algorithm for Object Detection (객체검출을 위한 빠르고 효율적인 Haar-Like 피쳐 선택 알고리즘)

  • Chung, Byung Woo;Park, Ki-Yeong;Hwang, Sun-Young
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.38A no.6
    • /
    • pp.486-491
    • /
    • 2013
  • This paper proposes a fast and efficient Haar-like feature selection algorithm for training classifier used in object detection. Many features selected by Haar-like feature selection algorithm and existing AdaBoost algorithm are either similar in shape or overlapping due to considering only feature's error rate. The proposed algorithm calculates similarity of features by their shape and distance between features. Fast and efficient feature selection is made possible by removing selected features and features with high similarity from feature set. FERET face database is used to compare performance of classifiers trained by previous algorithm and proposed algorithm. Experimental results show improved performance comparing classifier trained by proposed method to classifier trained by previous method. When classifier is trained to show same performance, proposed method shows 20% reduction of features used in classification.

Feature selection in the semivarying coefficient LS-SVR

  • Hwang, Changha;Shim, Jooyong
    • Journal of the Korean Data and Information Science Society
    • /
    • v.28 no.2
    • /
    • pp.461-471
    • /
    • 2017
  • In this paper we propose a feature selection method identifying important features in the semivarying coefficient model. One important issue in semivarying coefficient model is how to estimate the parametric and nonparametric components. Another issue is how to identify important features in the varying and the constant effects. We propose a feature selection method able to address this issue using generalized cross validation functions of the varying coefficient least squares support vector regression (LS-SVR) and the linear LS-SVR. Numerical studies indicate that the proposed method is quite effective in identifying important features in the varying and the constant effects in the semivarying coefficient model.

A Feature Selection Technique based on Distributional Differences

  • Kim, Sung-Dong
    • Journal of Information Processing Systems
    • /
    • v.2 no.1
    • /
    • pp.23-27
    • /
    • 2006
  • This paper presents a feature selection technique based on distributional differences for efficient machine learning. Initial training data consists of data including many features and a target value. We classified them into positive and negative data based on the target value. We then divided the range of the feature values into 10 intervals and calculated the distribution of the intervals in each positive and negative data. Then, we selected the features and the intervals of the features for which the distributional differences are over a certain threshold. Using the selected intervals and features, we could obtain the reduced training data. In the experiments, we will show that the reduced training data can reduce the training time of the neural network by about 40%, and we can obtain more profit on simulated stock trading using the trained functions as well.

Performance Improvement of a Collaborative Recommendation System using Feature Selection (속성추출을 이용한 협동적 추천시스템의 성능 향상)

  • Yoo, Sang-Jong;Kwon, Young- S.
    • IE interfaces
    • /
    • v.19 no.1
    • /
    • pp.70-77
    • /
    • 2006
  • One of the problems in developing a collaborative recommendation system is the scalability. To alleviate the scalability problem efficiently, enhancing the performance of the recommendation system, we propose a new recommendation system using feature selection. In our experiments, the proposed system using about a third of all features shows the comparable performances when compared with using all features in light of precision, recall and number of computations, as the number of users and products increases.

Diagnosis of Alzheimer's Disease using Combined Feature Selection Method

  • Faisal, Fazal Ur Rehman;Khatri, Uttam;Kwon, Goo-Rak
    • Journal of Korea Multimedia Society
    • /
    • v.24 no.5
    • /
    • pp.667-675
    • /
    • 2021
  • The treatments for symptoms of Alzheimer's disease are being provided and for the early diagnosis several researches are undergoing. In this regard, by using T1-weighted images several classification techniques had been proposed to distinguish among AD, MCI, and Healthy Control (HC) patients. In this paper, we also used some traditional Machine Learning (ML) approaches in order to diagnose the AD. This paper consists of an improvised feature selection method which is used to reduce the model complexity which accounted an issue while utilizing the ML approaches. In our presented work, combination of subcortical and cortical features of 308 subjects of ADNI dataset has been used to diagnose AD using structural magnetic resonance (sMRI) images. Three classification experiments were performed: binary classification. i.e., AD vs eMCI, AD vs lMCI, and AD vs HC. Proposed Feature Selection method consist of a combination of Principal Component Analysis and Recursive Feature Elimination method that has been used to reduce the dimension size and selection of best features simultaneously. Experiment on the dataset demonstrated that SVM is best suited for the AD vs lMCI, AD vs HC, and AD vs eMCI classification with the accuracy of 95.83%, 97.83%, and 97.87% respectively.

A Diagnostic Feature Subset Selection of Breast Tumor Based on Neighborhood Rough Set Model (Neighborhood 러프집합 모델을 활용한 유방 종양의 진단적 특징 선택)

  • Son, Chang-Sik;Choi, Rock-Hyun;Kang, Won-Seok;Lee, Jong-Ha
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.21 no.6
    • /
    • pp.13-21
    • /
    • 2016
  • Feature selection is the one of important issue in the field of data mining and machine learning. It is the technique to find a subset of features which provides the best classification performance, from the source data. We propose a feature subset selection method using the neighborhood rough set model based on information granularity. To demonstrate the effectiveness of proposed method, it was applied to select the useful features associated with breast tumor diagnosis of 298 shape features extracted from 5,252 breast ultrasound images, which include 2,745 benign and 2,507 malignant cases. Experimental results showed that 19 diagnostic features were strong predictors of breast cancer diagnosis and then average classification accuracy was 97.6%.

Microblog User Geolocation by Extracting Local Words Based on Word Clustering and Wrapper Feature Selection

  • Tian, Hechan;Liu, Fenlin;Luo, Xiangyang;Zhang, Fan;Qiao, Yaqiong
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.14 no.10
    • /
    • pp.3972-3988
    • /
    • 2020
  • Existing methods always rely on statistical features to extract local words for microblog user geolocation. There are many non-local words in extracted words, which makes geolocation accuracy lower. Considering the statistical and semantic features of local words, this paper proposes a microblog user geolocation method by extracting local words based on word clustering and wrapper feature selection. First, ordinary words without positional indications are initially filtered based on statistical features. Second, a word clustering algorithm based on word vectors is proposed. The remaining semantically similar words are clustered together based on the distance of word vectors with semantic meanings. Next, a wrapper feature selection algorithm based on sequential backward subset search is proposed. The cluster subset with the best geolocation effect is selected. Words in selected cluster subset are extracted as local words. Finally, the Naive Bayes classifier is trained based on local words to geolocate the microblog user. The proposed method is validated based on two different types of microblog data - Twitter and Weibo. The results show that the proposed method outperforms existing two typical methods based on statistical features in terms of accuracy, precision, recall, and F1-score.