• 제목/요약/키워드: Subset selection

검색결과 203건 처리시간 0.028초

공정변수를 갖는 혼합물 실험 자료의 분석 (Analysis of mixture experimental data with process variables)

  • 임용빈
    • 품질경영학회지
    • /
    • 제40권3호
    • /
    • pp.347-358
    • /
    • 2012
  • Purpose: Given the mixture components - process variables experimental data, we propose the strategy to find the proper combined model. Methods: Process variables are factors in an experiment that are not mixture components but could affect the blending properties of the mixture ingredients. For example, the effectiveness of an etching solution which is measured as an etch rate is not only a function of the proportions of the three acids that are combined to form the mixture, but also depends on the temperature of the solution and the agitation rate. Efficient designs for the mixture components - process variables experiments depend on the mixture components - process variables model which is called a combined model. We often use the product model between the canonical polynomial model for the mixture and process variables model as a combined model. Results: First we choose the reasonable starting models among the class of admissible product models and practical combined models suggested by Lim(2011) based on the model selection criteria and then, search for candidate models which are subset models of the starting model by the sequential variables selection method or all possible regressions procedure. Conclusion: Good candidate models are screened by the evaluation of model selection criteria and checking the residual plots for the validity of the model assumption. The strategy to find the proper combined model is illustrated with examples in this paper.

Identification of Chinese Event Types Based on Local Feature Selection and Explicit Positive & Negative Feature Combination

  • Tan, Hongye;Zhao, Tiejun;Wang, Haochang;Hong, Wan-Pyo
    • Journal of information and communication convergence engineering
    • /
    • 제5권3호
    • /
    • pp.233-238
    • /
    • 2007
  • An approach to identify Chinese event types is proposed in this paper which combines a good feature selection policy and a Maximum Entropy (ME) model. The approach not only effectively alleviates the problem that classifier performs poorly on the small and difficult types, but improve overall performance. Experiments on the ACE2005 corpus show that performance is satisfying with the 83.5% macro - average F measure. The main characters and ideas of the approach are: (1) Optimal feature set is built for each type according to local feature selection, which fully ensures the performance of each type. (2) Positive and negative features are explicitly discriminated and combined by using one - sided metrics, which makes use of both features' advantages. (3) Wrapper methods are used to search new features and evaluate the various feature subsets to obtain the optimal feature subset.

가중치를 갖는 그래프신호를 위한 샘플링 집합 선택 알고리즘 (Sampling Set Selection Algorithm for Weighted Graph Signals)

  • 김윤학
    • 한국전자통신학회논문지
    • /
    • 제17권1호
    • /
    • pp.153-160
    • /
    • 2022
  • 그래프신호가 각각의 가중치를 갖고 발생하는 경우 그래프상의 최적의 샘플링 노드집합을 선택하는 탐욕알고리즘에 대해 연구한다. 이를 위해 가중치를 반영한 복원오차를 비용함수로 사용하고 여기에 QR 분해를 적용하여 단순한 형태로 전개한다. 이렇게 도출된 가중치 복원오차를 최소화하기 위해 다양한 수학적 증명을 통해 반복적으로 노드를 선택할 수 있는 수학적 결과식을 유도한다. 이러한 결과식에 기반하여, 노드를 선택하는 샘플링 집합 선택알고리즘을 제안한다. 성능평가를 위해 다양한 그래프에서 발생하는 가중치를 갖는 그래프신호에 적용하여 기존 샘플링 선택 기술대비, 복잡도를 유지하면서 가중치 신호의 복원성능이 우수함을 보인다.

CRF Based Intrusion Detection System using Genetic Search Feature Selection for NSSA

  • Azhagiri M;Rajesh A;Rajesh P;Gowtham Sethupathi M
    • International Journal of Computer Science & Network Security
    • /
    • 제23권7호
    • /
    • pp.131-140
    • /
    • 2023
  • Network security situational awareness systems helps in better managing the security concerns of a network, by monitoring for any anomalies in the network connections and recommending remedial actions upon detecting an attack. An Intrusion Detection System helps in identifying the security concerns of a network, by monitoring for any anomalies in the network connections. We have proposed a CRF based IDS system using genetic search feature selection algorithm for network security situational awareness to detect any anomalies in the network. The conditional random fields being discriminative models are capable of directly modeling the conditional probabilities rather than joint probabilities there by achieving better classification accuracy. The genetic search feature selection algorithm is capable of identifying the optimal subset among the features based on the best population of features associated with the target class. The proposed system, when trained and tested on the bench mark NSL-KDD dataset exhibited higher accuracy in identifying an attack and also classifying the attack category.

퍼지 클러스터 분석 기반 특징 선택 방법 (A Feature Selection Method Based on Fuzzy Cluster Analysis)

  • 이현숙
    • 정보처리학회논문지B
    • /
    • 제14B권2호
    • /
    • pp.135-140
    • /
    • 2007
  • 특징선택은 문제 영역에서 관찰된 다차원데이터로부터 데이터가 묘사하는 구조를 잘 반영하는 속성을 선택하여 효과적인 실험 데이터를 구성하는 데이터 준비과정이다. 이 과정은 문서분류, 영상인식, 유전자 선택 분야에서의 같은 분류시스템의 성능향상에 중요한 구성요소로서 상관관계 기법, 차원축소 및 상호 정보 처리 등의 통계학이나 정보이론의 접근방법을 중심으로 연구되어왔다. 이와 같은 선택 분야의 연구는 다루는 데이터의 양이 방대해지고 복잡해지면서 더욱 중요시 되고 있다. 본 논문에서는 데이터가 가지는 특성을 반영하면서 새로운 데이터에 대하여 일반화 할 수 있는 특징선택 방법을 제안하고자 한다. 준비된 데이터의 각 속성 데이터에 대하여 퍼지 클러스터 분석에 의하여 최적의 클러스터 정보를 얻고 이를 바탕으로 근접성과 분리성의 경로를 측정하여 그 값에 따라 특징을 선택하는 매카니즘을 제공한다. 제안된 방법을 실세계의 컴퓨터 바이러스 분류에 적용하여 기존의 대비에 의한 휴리스틱 방법에 의해 선택된 데이터를 가지고 분류한 것과 비교하고자 한다. 이를 통하여 주어진 특징에 시연을 부여할 수 있고 효과적으로 특징을 선택하여 시스템의 성능을 향상 시킬 수 있음을 확인한다.

Probability Estimation of Snow Damage on Sugi (Cryptomeria japonica) Forest Stands by Logistic Regression Model in Toyama Prefecture, Japan

  • Kamo, Ken-Ichi;Yanagihara, Hirokazu;Kato, Akio;Yoshimoto, Atsushi
    • Journal of Forest and Environmental Science
    • /
    • 제24권3호
    • /
    • pp.137-142
    • /
    • 2008
  • In this paper, we apply a logistic regression model to the data of snow damage on sugi (Cryptomeria japonica) occurred in Toyama prefecture (in Japan) in 2004 for estimating the risk probability. In order to specify the factors effecting snow damage, we apply a model selection procedure determining optimal subset of explanatory variables. In this process we consider the following 3 information criteria, 1) Akaike's information criterion, 2) Baysian information criterion, 3) Bias-corrected Akaike's information criterion. For the selected variables, we give a proper interpretation from the viewpoint of natural disaster.

  • PDF

Non-convex penalized estimation for the AR process

  • Na, Okyoung;Kwon, Sunghoon
    • Communications for Statistical Applications and Methods
    • /
    • 제25권5호
    • /
    • pp.453-470
    • /
    • 2018
  • We study how to distinguish the parameters of the sparse autoregressive (AR) process from zero using a non-convex penalized estimation. A class of non-convex penalties are considered that include the smoothly clipped absolute deviation and minimax concave penalties as special examples. We prove that the penalized estimators achieve some standard theoretical properties such as weak and strong oracle properties which have been proved in sparse linear regression framework. The results hold when the maximal order of the AR process increases to infinity and the minimal size of true non-zero parameters decreases toward zero as the sample size increases. Further, we construct a practical method to select tuning parameters using generalized information criterion, of which the minimizer asymptotically recovers the best theoretical non-penalized estimator of the sparse AR process. Simulation studies are given to confirm the theoretical results.

Optimal Antenna Selection Scheme with Transmit Adaptive Array for Wideband CDMA Systems

  • Kim, Hak-Seong;Kim, Sanhae;Lee, Woncheol;Yoan Shin
    • 대한전자공학회:학술대회논문집
    • /
    • 대한전자공학회 2002년도 ITC-CSCC -3
    • /
    • pp.1960-1963
    • /
    • 2002
  • Transmit diversity schemes we an effective capacity improvement method for down link of wideband code division multiple access (W-CDMA) systems. In this paper, we propose to use transmit antenna subset selection scheme in conjunction with closed loop transmit adaptive array (TxAA). The proposed scheme selects N$\_$s/ optimum antennas among N$\_$${\gamma}$/(>N$\_$s/) transmit antennas in order to maximize diversity gain from selected antennas, and also reduces the cost of RE chains by employing two different types of RF modules fur the selected and the unselected antenna group, respectively. Computer simulation results show performance improvement by the proposed scheme over the conventional TxAA when considering up link control information feedback.

  • PDF

Selection of features and hidden Markov model parameters for English word recognition from Leap Motion air-writing trajectories

  • Deval Verma;Himanshu Agarwal;Amrish Kumar Aggarwal
    • ETRI Journal
    • /
    • 제46권2호
    • /
    • pp.250-262
    • /
    • 2024
  • Air-writing recognition is relevant in areas such as natural human-computer interaction, augmented reality, and virtual reality. A trajectory is the most natural way to represent air writing. We analyze the recognition accuracy of words written in air considering five features, namely, writing direction, curvature, trajectory, orthocenter, and ellipsoid, as well as different parameters of a hidden Markov model classifier. Experiments were performed on two representative datasets, whose sample trajectories were collected using a Leap Motion Controller from a fingertip performing air writing. Dataset D1 contains 840 English words from 21 classes, and dataset D2 contains 1600 English words from 40 classes. A genetic algorithm was combined with a hidden Markov model classifier to obtain the best subset of features. Combination ftrajectory, orthocenter, writing direction, curvatureg provided the best feature set, achieving recognition accuracies on datasets D1 and D2 of 98.81% and 83.58%, respectively.

A Feature Selection-based Ensemble Method for Arrhythmia Classification

  • Namsrai, Erdenetuya;Munkhdalai, Tsendsuren;Li, Meijing;Shin, Jung-Hoon;Namsrai, Oyun-Erdene;Ryu, Keun Ho
    • Journal of Information Processing Systems
    • /
    • 제9권1호
    • /
    • pp.31-40
    • /
    • 2013
  • In this paper, a novel method is proposed to build an ensemble of classifiers by using a feature selection schema. The feature selection schema identifies the best feature sets that affect the arrhythmia classification. Firstly, a number of feature subsets are extracted by applying the feature selection schema to the original dataset. Then classification models are built by using the each feature subset. Finally, we combine the classification models by adopting a voting approach to form a classification ensemble. The voting approach in our method involves both classification error rate and feature selection rate to calculate the score of the each classifier in the ensemble. In our method, the feature selection rate depends on the extracting order of the feature subsets. In the experiment, we applied our method to arrhythmia dataset and generated three top disjointed feature sets. We then built three classifiers based on the top-three feature subsets and formed the classifier ensemble by using the voting approach. Our method can improve the classification accuracy in high dimensional dataset. The performance of each classifier and the performance of their ensemble were higher than the performance of the classifier that was based on whole feature space of the dataset. The classification performance was improved and a more stable classification model could be constructed with the proposed approach.