• Title/Summary/Keyword: Subset selection

Search Result 203, Processing Time 0.027 seconds

Discretization Method Based on Quantiles for Variable Selection Using Mutual Information

  • CHa, Woon-Ock;Huh, Moon-Yul
    • Communications for Statistical Applications and Methods
    • /
    • v.12 no.3
    • /
    • pp.659-672
    • /
    • 2005
  • This paper evaluates discretization of continuous variables to select relevant variables for supervised learning using mutual information. Three discretization methods, MDL, Histogram and 4-Intervals are considered. The process of discretization and variable subset selection is evaluated according to the classification accuracies with the 6 real data sets of UCI databases. Results show that 4-Interval discretization method based on quantiles, is robust and efficient for variable selection process. We also visually evaluate the appropriateness of the selected subset of variables.

The Performance Analysis and Comparison of The MIMO-OFDM Scheme Applied to Pre-coding, Antenna Subset Selection and AMC for 4G Communication System (4G 통신시스템 기반의 Pre-coding과 Antenna Subset Selection, AMC 기법을 적용한 각 MIMO-OFDM 기법의 성능 분석 및 비교)

  • Cho, In-Sik;Seo, Chang-Woo;Yoon, Gil-Sang;Lee, Jeong-Hwan;Hwang, In-Tae
    • Journal of the Institute of Electronics Engineers of Korea TC
    • /
    • v.47 no.3
    • /
    • pp.31-38
    • /
    • 2010
  • In this paper, we have analyzed and compared the BER and the throughput performance through the computer simulation, after applying several MIMO schemes on the MIMO-OFDM system. Then, the throughput performance of the proposed system, Adaptive-MCM, is analyzed. As a result, the MIMO-OFDM Adaptive-MCM system proposed has a higher average data rate than Non Adaptive-MCM system through the improvement of Trade-off problem between throughput and SNR.

A two-stage damage detection approach based on subset selection and genetic algorithms

  • Yun, Gun Jin;Ogorzalek, Kenneth A.;Dyke, Shirley J.;Song, Wei
    • Smart Structures and Systems
    • /
    • v.5 no.1
    • /
    • pp.1-21
    • /
    • 2009
  • A two-stage damage detection method is proposed and demonstrated for structural health monitoring. In the first stage, the subset selection method is applied for the identification of the multiple damage locations. In the second stage, the damage severities of the identified damaged elements are determined applying SSGA to solve the optimization problem. In this method, the sensitivities of residual force vectors with respect to damage parameters are employed for the subset selection process. This approach is particularly efficient in detecting multiple damage locations. The SEREP is applied as needed to expand the identified mode shapes while using a limited number of sensors. Uncertainties in the stiffness of the elements are also considered as a source of modeling errors to investigate their effects on the performance of the proposed method in detecting damage in real-life structures. Through a series of illustrative examples, the proposed two-stage damage detection method is demonstrated to be a reliable tool for identifying and quantifying multiple damage locations within diverse structural systems.

Performance analysis of precoding-aided differential spatial modulation systems with transmit antenna selection

  • Kim, Sangchoon
    • ETRI Journal
    • /
    • v.44 no.1
    • /
    • pp.117-124
    • /
    • 2022
  • In this paper, the performance of precoding-aided differential spatial modulation (PDSM) systems with optimal transmit antenna subset (TAS) selection is examined analytically. The average bit error rate (ABER) performance of the optimal TAS selection-based PDSM systems using a zero-forcing (ZF) precoder is evaluated using theoretical upper bound and Monte Carlo simulations. Simulation results validate the analysis and demonstrate a performance penalty < 2.6 dB compared with precoding-aided spatial modulation (PSM) with optimal TAS selection. The performance analysis reveals a transmit diversity gain of (NT-NR+1) for the ZF-based PDSM (ZF-PDSM) systems that employ TAS selection with NT transmit antennas, NS selected transmit antennas, and NR receive antennas. It is also shown that reducing the number of activated transmit antennas via optimal TAS selection in the ZF-PDSM systems degrades ABER performance. In addition, the impacts of channel estimation errors on the performance of the ZF-PDSM system with TAS selection are evaluated, and the performance of this system is compared with that of ZF-based PSM with TAS selection.

A Bayesian Method for Narrowing the Scope of Variable Selection in Binary Response Logistic Regression

  • Kim, Hea-Jung;Lee, Ae-Kyung
    • Journal of Korean Society for Quality Management
    • /
    • v.26 no.1
    • /
    • pp.143-160
    • /
    • 1998
  • This article is concerned with the selection of subsets of predictor variables to be included in bulding the binary response logistic regression model. It is based on a Bayesian aproach, intended to propose and develop a procedure that uses probabilistic considerations for selecting promising subsets. This procedure reformulates the logistic regression setup in a hierarchical normal mixture model by introducing a set of hyperparameters that will be used to identify subset choices. It is done by use of the fact that cdf of logistic distribution is a, pp.oximately equivalent to that of $t_{(8)}$/.634 distribution. The a, pp.opriate posterior probability of each subset of predictor variables is obtained by the Gibbs sampler, which samples indirectly from the multinomial posterior distribution on the set of possible subset choices. Thus, in this procedure, the most promising subset of predictors can be identified as that with highest posterior probability. To highlight the merit of this procedure a couple of illustrative numerical examples are given.

  • PDF

Feature Subset Selection in the Induction Algorithm using Sensitivity Analysis of Neural Networks (신경망의 민감도 분석을 이용한 귀납적 학습기법의 변수 부분집합 선정)

  • 강부식;박상찬
    • Journal of Intelligence and Information Systems
    • /
    • v.7 no.2
    • /
    • pp.51-63
    • /
    • 2001
  • In supervised machine learning, an induction algorithm, which is able to extract rules from data with learning capability, provides a useful tool for data mining. Practical induction algorithms are known to degrade in prediction accuracy and generate complex rules unnecessarily when trained on data containing superfluous features. Thus it needs feature subset selection for better performance of them. In feature subset selection on the induction algorithm, wrapper method is repeatedly run it on the dataset using various feature subsets. But it is impractical to search the whole space exhaustively unless the features are small. This study proposes a heuristic method that uses sensitivity analysis of neural networks to the wrapper method for generating rules with higher possible accuracy. First it gives priority to all features using sensitivity analysis of neural networks. And it uses the wrapper method that searches the ordered feature space. In experiments to three datasets, we show that the suggested method is capable of selecting a feature subset that improves the performance of the induction algorithm within certain iteration.

  • PDF

Improvement of Classification Accuracy on Success and Failure Factors in Software Reuse using Feature Selection (특징 선택을 이용한 소프트웨어 재사용의 성공 및 실패 요인 분류 정확도 향상)

  • Kim, Young-Ok;Kwon, Ki-Tae
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.2 no.4
    • /
    • pp.219-226
    • /
    • 2013
  • Feature selection is the one of important issues in the field of machine learning and pattern recognition. It is the technique to find a subset from the source data and can give the best classification performance. Ie, it is the technique to extract the subset closely related to the purpose of the classification. In this paper, we experimented to select the best feature subset for improving classification accuracy when classify success and failure factors in software reuse. And we compared with existing studies. As a result, we found that a feature subset was selected in this study showed the better classification accuracy.

A Diagnostic Feature Subset Selection of Breast Tumor Based on Neighborhood Rough Set Model (Neighborhood 러프집합 모델을 활용한 유방 종양의 진단적 특징 선택)

  • Son, Chang-Sik;Choi, Rock-Hyun;Kang, Won-Seok;Lee, Jong-Ha
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.21 no.6
    • /
    • pp.13-21
    • /
    • 2016
  • Feature selection is the one of important issue in the field of data mining and machine learning. It is the technique to find a subset of features which provides the best classification performance, from the source data. We propose a feature subset selection method using the neighborhood rough set model based on information granularity. To demonstrate the effectiveness of proposed method, it was applied to select the useful features associated with breast tumor diagnosis of 298 shape features extracted from 5,252 breast ultrasound images, which include 2,745 benign and 2,507 malignant cases. Experimental results showed that 19 diagnostic features were strong predictors of breast cancer diagnosis and then average classification accuracy was 97.6%.

Microblog User Geolocation by Extracting Local Words Based on Word Clustering and Wrapper Feature Selection

  • Tian, Hechan;Liu, Fenlin;Luo, Xiangyang;Zhang, Fan;Qiao, Yaqiong
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.14 no.10
    • /
    • pp.3972-3988
    • /
    • 2020
  • Existing methods always rely on statistical features to extract local words for microblog user geolocation. There are many non-local words in extracted words, which makes geolocation accuracy lower. Considering the statistical and semantic features of local words, this paper proposes a microblog user geolocation method by extracting local words based on word clustering and wrapper feature selection. First, ordinary words without positional indications are initially filtered based on statistical features. Second, a word clustering algorithm based on word vectors is proposed. The remaining semantically similar words are clustered together based on the distance of word vectors with semantic meanings. Next, a wrapper feature selection algorithm based on sequential backward subset search is proposed. The cluster subset with the best geolocation effect is selected. Words in selected cluster subset are extracted as local words. Finally, the Naive Bayes classifier is trained based on local words to geolocate the microblog user. The proposed method is validated based on two different types of microblog data - Twitter and Weibo. The results show that the proposed method outperforms existing two typical methods based on statistical features in terms of accuracy, precision, recall, and F1-score.

Subset Selection Procedures Based on Some Robust Estimators

  • Song, Moon-Sub;Chung, Han-Yeong;Bae, Wha-Soo
    • Journal of the Korean Statistical Society
    • /
    • v.11 no.2
    • /
    • pp.109-117
    • /
    • 1982
  • In this paper, a preliminary study is performed on the subset selection procedures which are based on the trimmed means and the Hodges-Lehmann estimator derived from the Wilcoxon test. The proposed procedures are compared to the Gupta's rule through a small smaple Monte Carlo study. The results show that the procedures based on the robust estimators are successful in terms of efficiency and robustness.

  • PDF