• 제목/요약/키워드: Selection procedure

검색결과 1,056건 처리시간 0.027초

A Variable Selection Procedure for K-Means Clustering

  • Kim, Sung-Soo
    • 응용통계연구
    • /
    • 제25권3호
    • /
    • pp.471-483
    • /
    • 2012
  • One of the most important problems in cluster analysis is the selection of variables that truly define cluster structure, while eliminating noisy variables that mask such structure. Brusco and Cradit (2001) present VS-KM(variable-selection heuristic for K-means clustering) procedure for selecting true variables for K-means clustering based on adjusted Rand index. This procedure starts with the fixed number of clusters in K-means and adds variables sequentially based on an adjusted Rand index. This paper presents an updated procedure combining the VS-KM with the automated K-means procedure provided by Kim (2009). This automated variable selection procedure for K-means clustering calculates the cluster number and initial cluster center whenever new variable is added and adds a variable based on adjusted Rand index. Simulation result indicates that the proposed procedure is very effective at selecting true variables and at eliminating noisy variables. Implemented program using R can be obtained on the website "http://faculty.knou.ac.kr/sskim/nvarkm.r and vnvarkm.r".

분산이 미지인 정규모집단의 평균에 대한 베이즈-P* 선택방법에 관한 연구+ (A Bayes-P* Selection Procedure for Normal Means with Common Unknown Variance+)

  • 김우철;전종우;한경수
    • 응용통계연구
    • /
    • 제3권2호
    • /
    • pp.79-89
    • /
    • 1990
  • 정규분포를 가정한 통상적인 일원배치모형에서 모평균들을 비교하는 부분집합 선택방법으로서 베이즈-$P^*$ 선택방법을 제시하고 기존의 방법과의 관계를 알아보고, 그 운용특성에 대한 모의실험의 결과를 고찰하였다.

  • PDF

강구조물설계에서 부재선정의 시스템화 방법론 (Member Selection Procedure in the Steel Structural Design)

  • 이영호;김상철;김흥국;이병해
    • 한국전산구조공학회:학술대회논문집
    • /
    • 한국전산구조공학회 1995년도 가을 학술발표회 논문집
    • /
    • pp.197-206
    • /
    • 1995
  • In structural design procedure, The procedure of member selection manages complex data relationship and reflects structural expert's knowledge. It is a difficult problem to construct an effective system with the conventional l programming technique. Knowledge_based s!'stem is a software system capable of supporting the explicit representation of expert's knowledge in member selection process through member data and reasoning mechanisms. This study describes useful methodology for structuring knowledge and representing relation between member data and knowledge. And this study shows the application of this member for member selection in the steel structural design.

  • PDF

A Two-stage Selection Procedure for Exponential Populations

  • Han, Kyung-Soo;Kim, Woo-Chul
    • Journal of the Korean Statistical Society
    • /
    • 제16권1호
    • /
    • pp.37-44
    • /
    • 1987
  • A two-stage selection procedure is considered in the case of exponential populations with common known scale parameter. The proposed procedure is designed following the lines of Tamhane and Bechhofer(1977). The design constants to implement the procedure are provided. Monte Carlo results show that the proposed procedure performs better than the single procedure by Raghvachari and Starr (1970) in terms of the expected total sample size.

  • PDF

Cox proportional hazard model with L1 penalty

  • Hwang, Chang-Ha;Shim, Joo-Yong
    • Journal of the Korean Data and Information Science Society
    • /
    • 제22권3호
    • /
    • pp.613-618
    • /
    • 2011
  • The proposed method is based on a penalized log partial likelihood of Cox proportional hazard model with L1-penalty. We use the iteratively reweighted least squares procedure to solve L1 penalized log partial likelihood function of Cox proportional hazard model. It provide the ecient computation including variable selection and leads to the generalized cross validation function for the model selection. Experimental results are then presented to indicate the performance of the proposed procedure.

A Robust Subset Selection Procedure for Location Parameter Based on Hodges-Lehmann Estimators

  • Lee, Kang Sup
    • 품질경영학회지
    • /
    • 제19권1호
    • /
    • pp.51-64
    • /
    • 1991
  • This paper deals with a robust subset selection procedure based on Hodges-Lehmann estimators of location parameters. An improved formula for the estimated standard error of Hodges-Lehmann estimators is considered. Also, the degrees of freedom of the studentized Hodges-Lehmann estimators are investigated and it is suggested to use 0.8n instead of n-1. The proposed procedure is compared with the other subset selection procedures and it is shown to have good effciency for heavy-tailed distributions.

  • PDF

Selection of Geospatial Features for Location Guidance Map Generation

  • Kakinohana, Issei;Nie, Yoshinori;Nakamura, Morikazu;Miyagi, Hayao;Onaga, Kenji
    • 대한전자공학회:학술대회논문집
    • /
    • 대한전자공학회 2000년도 ITC-CSCC -2
    • /
    • pp.1107-1110
    • /
    • 2000
  • This paper proposes a selection procedure of geospatial data for location guidance map generation system. The selection procedure requires some targets appointed by users as input data and outputs generation. The procedure is embedded in a prototype of object-oriented GIS. We show sample maps generated by the system.

  • PDF

Variable Selection and Outlier Detection for Automated K-means Clustering

  • Kim, Sung-Soo
    • Communications for Statistical Applications and Methods
    • /
    • 제22권1호
    • /
    • pp.55-67
    • /
    • 2015
  • An important problem in cluster analysis is the selection of variables that define cluster structure that also eliminate noisy variables that mask cluster structure; in addition, outlier detection is a fundamental task for cluster analysis. Here we provide an automated K-means clustering process combined with variable selection and outlier identification. The Automated K-means clustering procedure consists of three processes: (i) automatically calculating the cluster number and initial cluster center whenever a new variable is added, (ii) identifying outliers for each cluster depending on used variables, (iii) selecting variables defining cluster structure in a forward manner. To select variables, we applied VS-KM (variable-selection heuristic for K-means clustering) procedure (Brusco and Cradit, 2001). To identify outliers, we used a hybrid approach combining a clustering based approach and distance based approach. Simulation results indicate that the proposed automated K-means clustering procedure is effective to select variables and identify outliers. The implemented R program can be obtained at http://www.knou.ac.kr/~sskim/SVOKmeans.r.

On a Robust Subset Selection Procedure for the Slopes of Regression Equations

  • Song, Moon-Sup;Oh, Chang-Hyuck
    • Journal of the Korean Statistical Society
    • /
    • 제10권
    • /
    • pp.105-121
    • /
    • 1981
  • The problem of selection of a subset containing the largest of several slope parameters of regression equations is considered. The proposed selection procedure is based on the weighted median estimators for regression parameters and the median of rescaled absolute residuals for scale parameters. Those estimators are compared with the classical least squares estimators by a simulation study. A Monte Carlo comparison is also made between the new procedure based on the weighted median estiamtors and the procedure based on the least squares estimators. The results show that the proposed procedure is quite robust with respect to the heaviness of distribution tails.

  • PDF

토픽 모형을 이용한 텍스트 데이터의 단어 선택 (Feature selection for text data via topic modeling)

  • 장우솔;김예은;손원
    • 응용통계연구
    • /
    • 제35권6호
    • /
    • pp.739-754
    • /
    • 2022
  • 텍스트 데이터는 일반적으로 많은 변수를 포함하고 있으며 변수들 사이의 연관성도 높아 통계 분석의 정확성, 효율성 등에서 문제가 생길 수 있다. 이러한 문제점에 대처하기 위해 목표 변수가 주어진 지도 학습에서는 목표 변수를 잘 설명할 수 있는 단어들을 선택하여 이 단어들만 통계 분석에 이용하기도 한다. 반면, 비지도 학습에서는 목표 변수가 주어지지 않으므로 지도 학습에서와 같은 단어 선택 절차를 활용하기 어렵다. 이 연구에서는 토픽 모형을 이용하여 지도 학습에서의 목표 변수를 대신할 수 있는 토픽을 생성하고 각 토픽별로 연관성이 높은 단어들을 선택하는 단어 선택 절차를 제안한다. 제안된 절차를 실제 텍스트 데이터에 적용한 결과, 단어 선택 절차를 이용하면 많은 토픽에서 공통적으로 자주 등장하는 단어들을 제거함으로써 토픽을 더 명확하게 식별할 수 있었다. 또한, 군집 분석에 적용한 결과, 군집과 범주 사이에 높은 연관성을 가지는 군집 분석 결과를 얻을 수 있는 것으로 나타났다. 목표 변수에 대한 정보없이 토픽 모형을 이용하여 선택한 단어들을 분류 분석에 적용하였을 때 목표 변수를 이용하여 단어들을 선택한 경우와 비슷한 분류 정확성을 얻을 수 있음도 확인하였다.