• Title/Summary/Keyword: Selection procedure

Search Result 1,052, Processing Time 0.023 seconds

A Variable Selection Procedure for K-Means Clustering

  • Kim, Sung-Soo
    • The Korean Journal of Applied Statistics
    • /
    • v.25 no.3
    • /
    • pp.471-483
    • /
    • 2012
  • One of the most important problems in cluster analysis is the selection of variables that truly define cluster structure, while eliminating noisy variables that mask such structure. Brusco and Cradit (2001) present VS-KM(variable-selection heuristic for K-means clustering) procedure for selecting true variables for K-means clustering based on adjusted Rand index. This procedure starts with the fixed number of clusters in K-means and adds variables sequentially based on an adjusted Rand index. This paper presents an updated procedure combining the VS-KM with the automated K-means procedure provided by Kim (2009). This automated variable selection procedure for K-means clustering calculates the cluster number and initial cluster center whenever new variable is added and adds a variable based on adjusted Rand index. Simulation result indicates that the proposed procedure is very effective at selecting true variables and at eliminating noisy variables. Implemented program using R can be obtained on the website "http://faculty.knou.ac.kr/sskim/nvarkm.r and vnvarkm.r".

A Bayes-P* Selection Procedure for Normal Means with Common Unknown Variance+ (분산이 미지인 정규모집단의 평균에 대한 베이즈-P* 선택방법에 관한 연구+)

  • 김우철;전종우;한경수
    • The Korean Journal of Applied Statistics
    • /
    • v.3 no.2
    • /
    • pp.79-89
    • /
    • 1990
  • For selecting a subset of k normal populations containing the one with the largest mean, a Bayes-$P^*$ selection procedure is considered when the common variance is unknown. Performance of the Bayes-$P^*$ selection procedure is compared with a well known classical procedure through a simulation study. Some frequentist's characteristics of Bayes-$P^*$ procedure are also studied.

  • PDF

Member Selection Procedure in the Steel Structural Design (강구조물설계에서 부재선정의 시스템화 방법론)

  • 이영호;김상철;김흥국;이병해
    • Proceedings of the Computational Structural Engineering Institute Conference
    • /
    • 1995.10a
    • /
    • pp.197-206
    • /
    • 1995
  • In structural design procedure, The procedure of member selection manages complex data relationship and reflects structural expert's knowledge. It is a difficult problem to construct an effective system with the conventional l programming technique. Knowledge_based s!'stem is a software system capable of supporting the explicit representation of expert's knowledge in member selection process through member data and reasoning mechanisms. This study describes useful methodology for structuring knowledge and representing relation between member data and knowledge. And this study shows the application of this member for member selection in the steel structural design.

  • PDF

A Two-stage Selection Procedure for Exponential Populations

  • Han, Kyung-Soo;Kim, Woo-Chul
    • Journal of the Korean Statistical Society
    • /
    • v.16 no.1
    • /
    • pp.37-44
    • /
    • 1987
  • A two-stage selection procedure is considered in the case of exponential populations with common known scale parameter. The proposed procedure is designed following the lines of Tamhane and Bechhofer(1977). The design constants to implement the procedure are provided. Monte Carlo results show that the proposed procedure performs better than the single procedure by Raghvachari and Starr (1970) in terms of the expected total sample size.

  • PDF

Cox proportional hazard model with L1 penalty

  • Hwang, Chang-Ha;Shim, Joo-Yong
    • Journal of the Korean Data and Information Science Society
    • /
    • v.22 no.3
    • /
    • pp.613-618
    • /
    • 2011
  • The proposed method is based on a penalized log partial likelihood of Cox proportional hazard model with L1-penalty. We use the iteratively reweighted least squares procedure to solve L1 penalized log partial likelihood function of Cox proportional hazard model. It provide the ecient computation including variable selection and leads to the generalized cross validation function for the model selection. Experimental results are then presented to indicate the performance of the proposed procedure.

A Robust Subset Selection Procedure for Location Parameter Based on Hodges-Lehmann Estimators

  • Lee, Kang Sup
    • Journal of Korean Society for Quality Management
    • /
    • v.19 no.1
    • /
    • pp.51-64
    • /
    • 1991
  • This paper deals with a robust subset selection procedure based on Hodges-Lehmann estimators of location parameters. An improved formula for the estimated standard error of Hodges-Lehmann estimators is considered. Also, the degrees of freedom of the studentized Hodges-Lehmann estimators are investigated and it is suggested to use 0.8n instead of n-1. The proposed procedure is compared with the other subset selection procedures and it is shown to have good effciency for heavy-tailed distributions.

  • PDF

Selection of Geospatial Features for Location Guidance Map Generation

  • Kakinohana, Issei;Nie, Yoshinori;Nakamura, Morikazu;Miyagi, Hayao;Onaga, Kenji
    • Proceedings of the IEEK Conference
    • /
    • 2000.07b
    • /
    • pp.1107-1110
    • /
    • 2000
  • This paper proposes a selection procedure of geospatial data for location guidance map generation system. The selection procedure requires some targets appointed by users as input data and outputs generation. The procedure is embedded in a prototype of object-oriented GIS. We show sample maps generated by the system.

  • PDF

Variable Selection and Outlier Detection for Automated K-means Clustering

  • Kim, Sung-Soo
    • Communications for Statistical Applications and Methods
    • /
    • v.22 no.1
    • /
    • pp.55-67
    • /
    • 2015
  • An important problem in cluster analysis is the selection of variables that define cluster structure that also eliminate noisy variables that mask cluster structure; in addition, outlier detection is a fundamental task for cluster analysis. Here we provide an automated K-means clustering process combined with variable selection and outlier identification. The Automated K-means clustering procedure consists of three processes: (i) automatically calculating the cluster number and initial cluster center whenever a new variable is added, (ii) identifying outliers for each cluster depending on used variables, (iii) selecting variables defining cluster structure in a forward manner. To select variables, we applied VS-KM (variable-selection heuristic for K-means clustering) procedure (Brusco and Cradit, 2001). To identify outliers, we used a hybrid approach combining a clustering based approach and distance based approach. Simulation results indicate that the proposed automated K-means clustering procedure is effective to select variables and identify outliers. The implemented R program can be obtained at http://www.knou.ac.kr/~sskim/SVOKmeans.r.

On a Robust Subset Selection Procedure for the Slopes of Regression Equations

  • Song, Moon-Sup;Oh, Chang-Hyuck
    • Journal of the Korean Statistical Society
    • /
    • v.10
    • /
    • pp.105-121
    • /
    • 1981
  • The problem of selection of a subset containing the largest of several slope parameters of regression equations is considered. The proposed selection procedure is based on the weighted median estimators for regression parameters and the median of rescaled absolute residuals for scale parameters. Those estimators are compared with the classical least squares estimators by a simulation study. A Monte Carlo comparison is also made between the new procedure based on the weighted median estiamtors and the procedure based on the least squares estimators. The results show that the proposed procedure is quite robust with respect to the heaviness of distribution tails.

  • PDF

Feature selection for text data via topic modeling (토픽 모형을 이용한 텍스트 데이터의 단어 선택)

  • Woosol, Jang;Ye Eun, Kim;Won, Son
    • The Korean Journal of Applied Statistics
    • /
    • v.35 no.6
    • /
    • pp.739-754
    • /
    • 2022
  • Usually, text data consists of many variables, and some of them are closely correlated. Such multi-collinearity often results in inefficient or inaccurate statistical analysis. For supervised learning, one can select features by examining the relationship between target variables and explanatory variables. On the other hand, for unsupervised learning, since target variables are absent, one cannot use such a feature selection procedure as in supervised learning. In this study, we propose a word selection procedure that employs topic models to find latent topics. We substitute topics for the target variables and select terms which show high relevance for each topic. Applying the procedure to real data, we found that the proposed word selection procedure can give clear topic interpretation by removing high-frequency words prevalent in various topics. In addition, we observed that, by applying the selected variables to the classifiers such as naïve Bayes classifiers and support vector machines, the proposed feature selection procedure gives results comparable to those obtained by using class label information.