• Title/Summary/Keyword: feature vector selection

Search Result 180, Processing Time 0.03 seconds

An Improved method of Two Stage Linear Discriminant Analysis

  • Chen, Yarui;Tao, Xin;Xiong, Congcong;Yang, Jucheng
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.12 no.3
    • /
    • pp.1243-1263
    • /
    • 2018
  • The two-stage linear discrimination analysis (TSLDA) is a feature extraction technique to solve the small size sample problem in the field of image recognition. The TSLDA has retained all subspace information of the between-class scatter and within-class scatter. However, the feature information in the four subspaces may not be entirely beneficial for classification, and the regularization procedure for eliminating singular metrics in TSLDA has higher time complexity. In order to address these drawbacks, this paper proposes an improved two-stage linear discriminant analysis (Improved TSLDA). The Improved TSLDA proposes a selection and compression method to extract superior feature information from the four subspaces to constitute optimal projection space, where it defines a single Fisher criterion to measure the importance of single feature vector. Meanwhile, Improved TSLDA also applies an approximation matrix method to eliminate the singular matrices and reduce its time complexity. This paper presents comparative experiments on five face databases and one handwritten digit database to validate the effectiveness of the Improved TSLDA.

A MA-plot-based Feature Selection by MRMR in SVM-RFE in RNA-Sequencing Data

  • Kim, Chayoung
    • The Journal of Korean Institute of Information Technology
    • /
    • v.16 no.12
    • /
    • pp.25-30
    • /
    • 2018
  • It is extremely lacking and urgently required that the method of constructing the Gene Regulatory Network (GRN) from RNA-Sequencing data (RNA-Seq) because of Big-Data and GRN in Big-Data has obtained substantial observation as the interactions among relevant featured genes and their regulations. We propose newly the computational comparative feature patterns selection method by implementing a minimum-redundancy maximum-relevancy (MRMR) filter the support vector machine-recursive feature elimination (SVM-RFE) with Intensity-dependent normalization (DEGSEQ) as a preprocessor for emphasizing equal preciseness in RNA-seq in Big-Data. We found out the proposed algorithm might be more scalable and convenient because of all libraries in R package and be more improved in terms of the time consuming in Big-Data and minimum-redundancy maximum-relevancy of a set of feature patterns at the same time.

One Channel Five-Way Classification Algorithm For Automatically Classifying Speech

  • Lee, Kyo-Sik
    • The Journal of the Acoustical Society of Korea
    • /
    • v.17 no.3E
    • /
    • pp.12-21
    • /
    • 1998
  • In this paper, we describe the one channel five-way, V/U/M/N/S (Voice/Unvoice/Nasal/Silent), classification algorithm for automatically classifying speech. The decision making process is viewed as a pattern viewed as a pattern recognition problem. Two aspects of the algorithm are developed: feature selection and classifier type. The feature selection procedure is studied for identifying a set of features to make V/U/M/N/S classification. The classifiers used are a vector quantization (VQ), a neural network(NN), and a decision tree method. Actual five sentences spoken by six speakers, three male and three female, are tested with proposed classifiers. From a set of measurement tests, the proposed classifiers show fairly good accuracy for V/U/M/N/S decision.

  • PDF

Feature selection and prediction modeling of drug responsiveness in Pharmacogenomics (약물유전체학에서 약물반응 예측모형과 변수선택 방법)

  • Kim, Kyuhwan;Kim, Wonkuk
    • The Korean Journal of Applied Statistics
    • /
    • v.34 no.2
    • /
    • pp.153-166
    • /
    • 2021
  • A main goal of pharmacogenomics studies is to predict individual's drug responsiveness based on high dimensional genetic variables. Due to a large number of variables, feature selection is required in order to reduce the number of variables. The selected features are used to construct a predictive model using machine learning algorithms. In the present study, we applied several hybrid feature selection methods such as combinations of logistic regression, ReliefF, TurF, random forest, and LASSO to a next generation sequencing data set of 400 epilepsy patients. We then applied the selected features to machine learning methods including random forest, gradient boosting, and support vector machine as well as a stacking ensemble method. Our results showed that the stacking model with a hybrid feature selection of random forest and ReliefF performs better than with other combinations of approaches. Based on a 5-fold cross validation partition, the mean test accuracy value of the best model was 0.727 and the mean test AUC value of the best model was 0.761. It also appeared that the stacking models outperform than single machine learning predictive models when using the same selected features.

Power Quality Disturbance Classification using Decision Fusion (결정결합 방법을 이용한 전력외란 신호의 식별)

  • 김기표;김병철;남상원
    • Proceedings of the IEEK Conference
    • /
    • 2000.09a
    • /
    • pp.915-918
    • /
    • 2000
  • In this paper, we propose an efficient feature vector extraction and decision fusion methods for the automatic classification of power system disturbances. Here, FFT and WPT(wavelet packet transform) are und to extract an appropriate feature for classifying power quality disturbances with variable properties. In particular, the WPT can be utilized to develop an adaptable feature extraction algorithm using best basis selection. Furthermore. the extracted feature vectors are applied as input to the decision fusion system which combines the decisions of several classifiers having complementary performances, leading to improvement of the classification performance. Finally, the applicability of the proposed approach is demonstrated using some simulations results obtained by analyzing power quality disturbances data generated by using Matlab.

  • PDF

Spam Filter by Using X2 Statistics and Support Vector Machines (카이제곱 통계량과 지지벡터기계를 이용한 스팸메일 필터)

  • Lee, Song-Wook
    • The KIPS Transactions:PartB
    • /
    • v.17B no.3
    • /
    • pp.249-254
    • /
    • 2010
  • We propose an automatic spam filter for e-mail data using Support Vector Machines(SVM). We use a lexical form of a word and its part of speech(POS) tags as features and select features by chi square statistics. We represent each feature by TF(text frequency), TF-IDF, and binary weight for experiments. After training SVM with the selected features, SVM classifies each e-mail as spam or not. In experiment, the selected features improve the performance of our system and we acquired overall 98.9% of accuracy with TREC05-p1 spam corpus.

Vibration based bridge scour evaluation: A data-driven method using support vector machines

  • Zhang, Zhiming;Sun, Chao;Li, Changbin;Sun, Mingxuan
    • Structural Monitoring and Maintenance
    • /
    • v.6 no.2
    • /
    • pp.125-145
    • /
    • 2019
  • Bridge scour is one of the predominant causes of bridge failure. Current climate deterioration leads to increase of flooding frequency and severity and thus poses a higher risk of bridge scour failure than before. Recent studies have explored extensively the vibration-based scour monitoring technique by analyzing the structural modal properties before and after damage. However, the state-of-art of this area lacks a systematic approach with sufficient robustness and credibility for practical decision making. This paper attempts to develop a data-driven methodology for bridge scour monitoring using support vector machines. This study extracts features from the bridge dynamic responses based on a generic sensitivity study on the bridge's modal properties and selects the features that are significantly contributive to bridge scour detection. Results indicate that the proposed data-driven method can quantify the bridge scour damage with satisfactory accuracy for most cases. This paper provides an alternative methodology for bridge scour evaluation using the machine learning method. It has the potential to be practically applied for bridge safety assessment in case that scour happens.

Development of Classification Model for hERG Ion Channel Inhibitors Using SVM Method (SVM 방법을 이용한 hERG 이온 채널 저해제 예측모델 개발)

  • Gang, Sin-Moon;Kim, Han-Jo;Oh, Won-Seok;Kim, Sun-Young;No, Kyoung-Tai;Nam, Ky-Youb
    • Journal of the Korean Chemical Society
    • /
    • v.53 no.6
    • /
    • pp.653-662
    • /
    • 2009
  • Developing effective tools for predicting absorption, distribution, metabolism, excretion properties and toxicity (ADME/T) of new chemical entities in the early stage of drug design is one of the most important tasks in drug discovery and development today. As one of these attempts, support vector machines (SVM) has recently been exploited for the prediction of ADME/T related properties. However, two problems in SVM modeling, i.e. feature selection and parameters setting, are still far from solved. The two problems have been shown to be crucial to the efficiency and accuracy of SVM classification. In particular, the feature selection and optimal SVM parameters setting influence each other, which indicates that they should be dealt with simultaneously. In this account, we present an integrated practical solution, in which genetic-based algorithm (GA) is used for feature selection and grid search (GS) method for parameters optimization. hERG ion-channel inhibitor classification models of ADME/T related properties has been built for assessing and testing the proposed GA-GS-SVM. We generated 6 different models that are 3 different single models and 3 different ensemble models using training set - 1891 compounds and validated with external test set - 175 compounds. We compared single model with ensemble model to solve data imbalance problems. It was able to improve accuracy of prediction to use ensemble model.

A Supervised Feature Selection Method for Malicious Intrusions Detection in IoT Based on Genetic Algorithm

  • Saman Iftikhar;Daniah Al-Madani;Saima Abdullah;Ammar Saeed;Kiran Fatima
    • International Journal of Computer Science & Network Security
    • /
    • v.23 no.3
    • /
    • pp.49-56
    • /
    • 2023
  • Machine learning methods diversely applied to the Internet of Things (IoT) field have been successful due to the enhancement of computer processing power. They offer an effective way of detecting malicious intrusions in IoT because of their high-level feature extraction capabilities. In this paper, we proposed a novel feature selection method for malicious intrusion detection in IoT by using an evolutionary technique - Genetic Algorithm (GA) and Machine Learning (ML) algorithms. The proposed model is performing the classification of BoT-IoT dataset to evaluate its quality through the training and testing with classifiers. The data is reduced and several preprocessing steps are applied such as: unnecessary information removal, null value checking, label encoding, standard scaling and data balancing. GA has applied over the preprocessed data, to select the most relevant features and maintain model optimization. The selected features from GA are given to ML classifiers such as Logistic Regression (LR) and Support Vector Machine (SVM) and the results are evaluated using performance evaluation measures including recall, precision and f1-score. Two sets of experiments are conducted, and it is concluded that hyperparameter tuning has a significant consequence on the performance of both ML classifiers. Overall, SVM still remained the best model in both cases and overall results increased.

Optimal EEG Feature Extraction using DWT for Classification of Imagination of Hands Movement

  • Chum, Pharino;Park, Seung-Min;Ko, Kwang-Eun;Sim, Kwee-Bo
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.21 no.6
    • /
    • pp.786-791
    • /
    • 2011
  • An optimal feature selection and extraction procedure is an important task that significantly affects the success of brain activity analysis in brain-computer interface (BCI) research area. In this paper, a novel method for extracting the optimal feature from electroencephalogram (EEG) signal is proposed. At first, a student's-t-statistic method is used to normalize and to minimize statistical error between EEG measurements. And, 2D time-frequency data set from the raw EEG signal was extracted using discrete wavelet transform (DWT) as a raw feature, standard deviations and mean of 2D time-frequency matrix were extracted as a optimal EEG feature vector along with other basis feature of sub-band signals. In the experiment, data set 1 of BCI competition IV are used and classification using SVM to prove strength of our new method.