• Title/Summary/Keyword: subset selection

Search Result 203, Processing Time 0.028 seconds

On a Subset Selection Procedure Based on Hodges-Lehmann Estimators

  • Song, Moon-Sup;Kim, Soon-Ock
    • Journal of the Korean Statistical Society
    • /
    • v.16 no.1
    • /
    • pp.26-36
    • /
    • 1987
  • In this paper, we study on a subset selection procedure based on Hodges-Lehmann estimators derived from the Wilcoxon test. To estimate the standard error of the Hodges-Lehmann estimators, the biweight A-estimator of scale is used. The Pitman efficiency of the proposed rule is compared with the Gupta's rule and the trimmed-means rule through a small-sample Monte Carlo study. The results show that the proposed rule satisfies the $P^*$-condition and is very efficient in various heavy-tailed distributions.

  • PDF

Regularization Method by Subset Selection for Structural Damage Detection (구조손상 탐색을 위한 부 집합 선택에 의한 정규화 방법)

  • Yun, Gun-Jin;Han, Bong-Koo
    • Journal of the Computational Structural Engineering Institute of Korea
    • /
    • v.21 no.1
    • /
    • pp.73-82
    • /
    • 2008
  • In this paper, a new regularization method by parameter subset selection method is proposed based on the residual force vector for damage localization. Although subset selection using the fundamental modal characteristics as a residual function has been successful in detecting a single damage location, this method seems to have limited capabilities in the detection of multiple damage locations and typically requires cumbersome weighting values. The method is presented herein and considers cases in which damage detection must be achieved using incomplete measurements of the structural responses. Model expansion is incorporated to deal with this challenge. The unique advantage of employing the new regularization method is that it can reliably identify multiple damage locations. Through an illustrative example, the proposed damage detection method is demonstrated to be a reliable tool for identifying multiple damage locations for a planar truss structure.

Variable Selection in Linear Random Effects Models for Normal Data

  • Kim, Hea-Jung
    • Journal of the Korean Statistical Society
    • /
    • v.27 no.4
    • /
    • pp.407-420
    • /
    • 1998
  • This paper is concerned with selecting covariates to be included in building linear random effects models designed to analyze clustered response normal data. It is based on a Bayesian approach, intended to propose and develop a procedure that uses probabilistic considerations for selecting premising subsets of covariates. The approach reformulates the linear random effects model in a hierarchical normal and point mass mixture model by introducing a set of latent variables that will be used to identify subset choices. The hierarchical model is flexible to easily accommodate sign constraints in the number of regression coefficients. Utilizing Gibbs sampler, the appropriate posterior probability of each subset of covariates is obtained. Thus, In this procedure, the most promising subset of covariates can be identified as that with highest posterior probability. The procedure is illustrated through a simulation study.

  • PDF

Comparisons of some subset selection procedures for K normal populations with unequal sample size (표본크기가 다른 정규모집단의 평균에 대한 부분집합선택절차론의 성질과 비교연구)

  • 손중권;김소연;김영훈
    • The Korean Journal of Applied Statistics
    • /
    • v.3 no.1
    • /
    • pp.79-87
    • /
    • 1990
  • The problem of selecting a nonempty subset of K(>2) normal means with unknown variances has been studies by many authors. But the comparisions of the properties and the efficiencies of the proposed subset selection procedures have not been carried out. Thus we investigate properties of the proposed procedures and compare their performances for various cases.

  • PDF

Feature Selection Algorithm for Intrusions Detection System using Sequential Forward Search and Random Forest Classifier

  • Lee, Jinlee;Park, Dooho;Lee, Changhoon
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.11 no.10
    • /
    • pp.5132-5148
    • /
    • 2017
  • Cyber attacks are evolving commensurate with recent developments in information security technology. Intrusion detection systems collect various types of data from computers and networks to detect security threats and analyze the attack information. The large amount of data examined make the large number of computations and low detection rates problematic. Feature selection is expected to improve the classification performance and provide faster and more cost-effective results. Despite the various feature selection studies conducted for intrusion detection systems, it is difficult to automate feature selection because it is based on the knowledge of security experts. This paper proposes a feature selection technique to overcome the performance problems of intrusion detection systems. Focusing on feature selection, the first phase of the proposed system aims at constructing a feature subset using a sequential forward floating search (SFFS) to downsize the dimension of the variables. The second phase constructs a classification model with the selected feature subset using a random forest classifier (RFC) and evaluates the classification accuracy. Experiments were conducted with the NSL-KDD dataset using SFFS-RF, and the results indicated that feature selection techniques are a necessary preprocessing step to improve the overall system performance in systems that handle large datasets. They also verified that SFFS-RF could be used for data classification. In conclusion, SFFS-RF could be the key to improving the classification model performance in machine learning.

A Hybrid Feature Selection Method using Univariate Analysis and LVF Algorithm (단변량 분석과 LVF 알고리즘을 결합한 하이브리드 속성선정 방법)

  • Lee, Jae-Sik;Jeong, Mi-Kyoung
    • Journal of Intelligence and Information Systems
    • /
    • v.14 no.4
    • /
    • pp.179-200
    • /
    • 2008
  • We develop a feature selection method that can improve both the efficiency and the effectiveness of classification technique. In this research, we employ case-based reasoning as a classification technique. Basically, this research integrates the two existing feature selection methods, i.e., the univariate analysis and the LVF algorithm. First, we sift some predictive features from the whole set of features using the univariate analysis. Then, we generate all possible subsets of features from these predictive features and measure the inconsistency rate of each subset using the LVF algorithm. Finally, the subset having the lowest inconsistency rate is selected as the best subset of features. We measure the performances of our feature selection method using the data obtained from UCI Machine Learning Repository, and compare them with those of existing methods. The number of selected features and the accuracy of our feature selection method are so satisfactory that the improvements both in efficiency and effectiveness are achieved.

  • PDF

Design and Performance Analysis of a Communication System with AMC and MIMO Mode Selection Scheme (AMC와 MIMO 선택 기법이 결합된 통신 시스템의 설계 및 성능 분석)

  • Lee, Jeong-Hwan;Yoon, Gil-Sang;Cho, In-Sik;Seo, Chang-Woo;Portugal, Sherlie;Hwang, In-Tae
    • Journal of the Institute of Electronics Engineers of Korea TC
    • /
    • v.47 no.3
    • /
    • pp.22-30
    • /
    • 2010
  • This paper proposes a combination system of Adaptive Modulation and Coding (AMC) and Multiple Input Multiple Output (MIMO), which improves the throughput and has a better reliability. In addition, the system includes Precoding, Antenna Subset Selection and MIMO Mode Selection scheme. Finally, we make a performance analysis of the proposed system. The principal environmental parameters for the simulation experiment consist of a frequency non-selective rayleigh fading channel and a Spreading Factor (SF) of 16. Other parameters may be included in order to fulfill the requirements of the HSDP A Standard. The proposed system has a higher throughput and more reliability than the conventional system, which does not include MIMO Mode Selection scheme, Precoding or Antenna Subset Selection. According to the simulation results, the proposed system reaches the maximum throughput at 8dB, presentlng an improvement of 6dB and twice higher throughput, respect to the conventional system. Specifically, at the point of -6dB, the conventional system reaches 2.5Mbps, while the proposed system reaches 6.4Mbps at the same SNR. Also, at the point of 2dB, each system reaches 7.5Mbps (conventional system) and 15.3Mbps (proposed system), with near twice the difference. According to the results exposed above, we can conclude that the system proposed in this paper has, as the greatest contribution, the improvement of the throughput, especially, the average throughput.

On a Robust Subset Selection Procedure for the Slopes of Regression Equations

  • Song, Moon-Sup;Oh, Chang-Hyuck
    • Journal of the Korean Statistical Society
    • /
    • v.10
    • /
    • pp.105-121
    • /
    • 1981
  • The problem of selection of a subset containing the largest of several slope parameters of regression equations is considered. The proposed selection procedure is based on the weighted median estimators for regression parameters and the median of rescaled absolute residuals for scale parameters. Those estimators are compared with the classical least squares estimators by a simulation study. A Monte Carlo comparison is also made between the new procedure based on the weighted median estiamtors and the procedure based on the least squares estimators. The results show that the proposed procedure is quite robust with respect to the heaviness of distribution tails.

  • PDF

A Bayesian Method for Narrowing the Scope fo Variable Selection in Binary Response t-Link Regression

  • Kim, Hea-Jung
    • Journal of the Korean Statistical Society
    • /
    • v.29 no.4
    • /
    • pp.407-422
    • /
    • 2000
  • This article is concerned with the selecting predictor variables to be included in building a class of binary response t-link regression models where both probit and logistic regression models can e approximately taken as members of the class. It is based on a modification of the stochastic search variable selection method(SSVS), intended to propose and develop a Bayesian procedure that used probabilistic considerations for selecting promising subsets of predictor variables. The procedure reformulates the binary response t-link regression setup in a hierarchical truncated normal mixture model by introducing a set of hyperparameters that will be used to identify subset choices. In this setup, the most promising subset of predictors can be identified as that with highest posterior probability in the marginal posterior distribution of the hyperparameters. To highlight the merit of the procedure, an illustrative numerical example is given.

  • PDF

Semantic-based Genetic Algorithm for Feature Selection (의미 기반 유전 알고리즘을 사용한 특징 선택)

  • Kim, Jung-Ho;In, Joo-Ho;Chae, Soo-Hoan
    • Journal of Internet Computing and Services
    • /
    • v.13 no.4
    • /
    • pp.1-10
    • /
    • 2012
  • In this paper, an optimal feature selection method considering sematic of features, which is preprocess of document classification is proposed. The feature selection is very important part on classification, which is composed of removing redundant features and selecting essential features. LSA (Latent Semantic Analysis) for considering meaning of the features is adopted. However, a supervised LSA which is suitable method for classification problems is used because the basic LSA is not specialized for feature selection. We also apply GA (Genetic Algorithm) to the features, which are obtained from supervised LSA to select better feature subset. Finally, we project documents onto new selected feature subset and classify them using specific classifier, SVM (Support Vector Machine). It is expected to get high performance and efficiency of classification by selecting optimal feature subset using the proposed hybrid method of supervised LSA and GA. Its efficiency is proved through experiments using internet news classification with low features.