• Title/Summary/Keyword: Subset selection

Search Result 203, Processing Time 0.021 seconds

Identification of Chinese Event Types Based on Local Feature Selection and Explicit Positive & Negative Feature Combination

  • Tan, Hongye;Zhao, Tiejun;Wang, Haochang;Hong, Wan-Pyo
    • Journal of information and communication convergence engineering
    • /
    • v.5 no.3
    • /
    • pp.233-238
    • /
    • 2007
  • An approach to identify Chinese event types is proposed in this paper which combines a good feature selection policy and a Maximum Entropy (ME) model. The approach not only effectively alleviates the problem that classifier performs poorly on the small and difficult types, but improve overall performance. Experiments on the ACE2005 corpus show that performance is satisfying with the 83.5% macro - average F measure. The main characters and ideas of the approach are: (1) Optimal feature set is built for each type according to local feature selection, which fully ensures the performance of each type. (2) Positive and negative features are explicitly discriminated and combined by using one - sided metrics, which makes use of both features' advantages. (3) Wrapper methods are used to search new features and evaluate the various feature subsets to obtain the optimal feature subset.

Sampling Set Selection Algorithm for Weighted Graph Signals (가중치를 갖는 그래프신호를 위한 샘플링 집합 선택 알고리즘)

  • Kim, Yoon Hak
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.17 no.1
    • /
    • pp.153-160
    • /
    • 2022
  • A greedy algorithm is proposed to select a subset of nodes of a graph for bandlimited graph signals in which each signal value is generated with its weight. Since graph signals are weighted, we seek to minimize the weighted reconstruction error which is formulated by using the QR factorization and derive an analytic result to find iteratively the node minimizing the weighted reconstruction error, leading to a simplified iterative selection process. Experiments show that the proposed method achieves a significant performance gain for graph signals with weights on various graphs as compared with the previous novel selection techniques.

CRF Based Intrusion Detection System using Genetic Search Feature Selection for NSSA

  • Azhagiri M;Rajesh A;Rajesh P;Gowtham Sethupathi M
    • International Journal of Computer Science & Network Security
    • /
    • v.23 no.7
    • /
    • pp.131-140
    • /
    • 2023
  • Network security situational awareness systems helps in better managing the security concerns of a network, by monitoring for any anomalies in the network connections and recommending remedial actions upon detecting an attack. An Intrusion Detection System helps in identifying the security concerns of a network, by monitoring for any anomalies in the network connections. We have proposed a CRF based IDS system using genetic search feature selection algorithm for network security situational awareness to detect any anomalies in the network. The conditional random fields being discriminative models are capable of directly modeling the conditional probabilities rather than joint probabilities there by achieving better classification accuracy. The genetic search feature selection algorithm is capable of identifying the optimal subset among the features based on the best population of features associated with the target class. The proposed system, when trained and tested on the bench mark NSL-KDD dataset exhibited higher accuracy in identifying an attack and also classifying the attack category.

A Feature Selection Method Based on Fuzzy Cluster Analysis (퍼지 클러스터 분석 기반 특징 선택 방법)

  • Rhee, Hyun-Sook
    • The KIPS Transactions:PartB
    • /
    • v.14B no.2
    • /
    • pp.135-140
    • /
    • 2007
  • Feature selection is a preprocessing technique commonly used on high dimensional data. Feature selection studies how to select a subset or list of attributes that are used to construct models describing data. Feature selection methods attempt to explore data's intrinsic properties by employing statistics or information theory. The recent developments have involved approaches like correlation method, dimensionality reduction and mutual information technique. This feature selection have become the focus of much research in areas of applications with massive and complex data sets. In this paper, we provide a feature selection method considering data characteristics and generalization capability. It provides a computational approach for feature selection based on fuzzy cluster analysis of its attribute values and its performance measures. And we apply it to the system for classifying computer virus and compared with heuristic method using the contrast concept. Experimental result shows the proposed approach can give a feature ranking, select the features, and improve the system performance.

Probability Estimation of Snow Damage on Sugi (Cryptomeria japonica) Forest Stands by Logistic Regression Model in Toyama Prefecture, Japan

  • Kamo, Ken-Ichi;Yanagihara, Hirokazu;Kato, Akio;Yoshimoto, Atsushi
    • Journal of Forest and Environmental Science
    • /
    • v.24 no.3
    • /
    • pp.137-142
    • /
    • 2008
  • In this paper, we apply a logistic regression model to the data of snow damage on sugi (Cryptomeria japonica) occurred in Toyama prefecture (in Japan) in 2004 for estimating the risk probability. In order to specify the factors effecting snow damage, we apply a model selection procedure determining optimal subset of explanatory variables. In this process we consider the following 3 information criteria, 1) Akaike's information criterion, 2) Baysian information criterion, 3) Bias-corrected Akaike's information criterion. For the selected variables, we give a proper interpretation from the viewpoint of natural disaster.

  • PDF

Non-convex penalized estimation for the AR process

  • Na, Okyoung;Kwon, Sunghoon
    • Communications for Statistical Applications and Methods
    • /
    • v.25 no.5
    • /
    • pp.453-470
    • /
    • 2018
  • We study how to distinguish the parameters of the sparse autoregressive (AR) process from zero using a non-convex penalized estimation. A class of non-convex penalties are considered that include the smoothly clipped absolute deviation and minimax concave penalties as special examples. We prove that the penalized estimators achieve some standard theoretical properties such as weak and strong oracle properties which have been proved in sparse linear regression framework. The results hold when the maximal order of the AR process increases to infinity and the minimal size of true non-zero parameters decreases toward zero as the sample size increases. Further, we construct a practical method to select tuning parameters using generalized information criterion, of which the minimizer asymptotically recovers the best theoretical non-penalized estimator of the sparse AR process. Simulation studies are given to confirm the theoretical results.

Optimal Antenna Selection Scheme with Transmit Adaptive Array for Wideband CDMA Systems

  • Kim, Hak-Seong;Kim, Sanhae;Lee, Woncheol;Yoan Shin
    • Proceedings of the IEEK Conference
    • /
    • 2002.07c
    • /
    • pp.1960-1963
    • /
    • 2002
  • Transmit diversity schemes we an effective capacity improvement method for down link of wideband code division multiple access (W-CDMA) systems. In this paper, we propose to use transmit antenna subset selection scheme in conjunction with closed loop transmit adaptive array (TxAA). The proposed scheme selects N$\_$s/ optimum antennas among N$\_$${\gamma}$/(>N$\_$s/) transmit antennas in order to maximize diversity gain from selected antennas, and also reduces the cost of RE chains by employing two different types of RF modules fur the selected and the unselected antenna group, respectively. Computer simulation results show performance improvement by the proposed scheme over the conventional TxAA when considering up link control information feedback.

  • PDF

Selection of features and hidden Markov model parameters for English word recognition from Leap Motion air-writing trajectories

  • Deval Verma;Himanshu Agarwal;Amrish Kumar Aggarwal
    • ETRI Journal
    • /
    • v.46 no.2
    • /
    • pp.250-262
    • /
    • 2024
  • Air-writing recognition is relevant in areas such as natural human-computer interaction, augmented reality, and virtual reality. A trajectory is the most natural way to represent air writing. We analyze the recognition accuracy of words written in air considering five features, namely, writing direction, curvature, trajectory, orthocenter, and ellipsoid, as well as different parameters of a hidden Markov model classifier. Experiments were performed on two representative datasets, whose sample trajectories were collected using a Leap Motion Controller from a fingertip performing air writing. Dataset D1 contains 840 English words from 21 classes, and dataset D2 contains 1600 English words from 40 classes. A genetic algorithm was combined with a hidden Markov model classifier to obtain the best subset of features. Combination ftrajectory, orthocenter, writing direction, curvatureg provided the best feature set, achieving recognition accuracies on datasets D1 and D2 of 98.81% and 83.58%, respectively.

A Feature Selection-based Ensemble Method for Arrhythmia Classification

  • Namsrai, Erdenetuya;Munkhdalai, Tsendsuren;Li, Meijing;Shin, Jung-Hoon;Namsrai, Oyun-Erdene;Ryu, Keun Ho
    • Journal of Information Processing Systems
    • /
    • v.9 no.1
    • /
    • pp.31-40
    • /
    • 2013
  • In this paper, a novel method is proposed to build an ensemble of classifiers by using a feature selection schema. The feature selection schema identifies the best feature sets that affect the arrhythmia classification. Firstly, a number of feature subsets are extracted by applying the feature selection schema to the original dataset. Then classification models are built by using the each feature subset. Finally, we combine the classification models by adopting a voting approach to form a classification ensemble. The voting approach in our method involves both classification error rate and feature selection rate to calculate the score of the each classifier in the ensemble. In our method, the feature selection rate depends on the extracting order of the feature subsets. In the experiment, we applied our method to arrhythmia dataset and generated three top disjointed feature sets. We then built three classifiers based on the top-three feature subsets and formed the classifier ensemble by using the voting approach. Our method can improve the classification accuracy in high dimensional dataset. The performance of each classifier and the performance of their ensemble were higher than the performance of the classifier that was based on whole feature space of the dataset. The classification performance was improved and a more stable classification model could be constructed with the proposed approach.

Feature Selection for Multi-Class Genre Classification using Gaussian Mixture Model (Gaussian Mixture Model을 이용한 다중 범주 분류를 위한 특징벡터 선택 알고리즘)

  • Moon, Sun-Kuk;Choi, Tack-Sung;Park, Young-Cheol;Youn, Dae-Hee
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.32 no.10C
    • /
    • pp.965-974
    • /
    • 2007
  • In this paper, we proposed the feature selection algorithm for multi-class genre classification. In our proposed algorithm, we developed GMM separation score based on Gaussian mixture model for measuring separability between two genres. Additionally, we improved feature subset selection algorithm based on sequential forward selection for multi-class genre classification. Instead of setting criterion as entire genre separability measures, we set criterion as worst genre separability measure for each sequential selection step. In order to assess the performance proposed algorithm, we extracted various features which represent characteristics such as timbre, rhythm, pitch and so on. Then, we investigate classification performance by GMM classifier and k-NN classifier for selected features using conventional algorithm and proposed algorithm. Proposed algorithm showed improved performance in classification accuracy up to 10 percent for classification experiments of low dimension feature vector especially.