Variable Selection in Normal Mixture Model Based Clustering under Heteroscedasticity

Kim, Seung-Gu;

doi:10.5351/KJAS.2011.24.6.1213

The Korean Journal of Applied Statistics (응용통계연구)

Volume 24 Issue 6
/
Pages.1213-1224
/
2011
/
1225-066X(pISSN)
/
2383-5818(eISSN)

The Korean Statistical Society (한국통계학회)

DOI QR Code

Variable Selection in Normal Mixture Model Based Clustering under Heteroscedasticity

이분산 상황 하에서 정규혼합모형 기반 군집분석의 변수선택

Kim, Seung-Gu (Department of Data and Information, Sangji University)

김승구 (상지대학교 컴퓨터데이터정보학과)

Received : 20110900
Accepted : 20110900
Published : 2011.12.31

https://doi.org/10.5351/KJAS.2011.24.6.1213 Citation PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

In high dimensionality where the number of variables are excessively larger than observations, it is required to remove the noninformative variables to cluster observations. Most model-based approaches for variable selection have been considered under the assumption of homoscedasticity and their models are mainly estimated by a penalized likelihood method. In this paper, a different approach is proposed to remove the noninformative variables effectively and to cluster based on the modified normal mixture model simultaneously. The validity of the model was provided and an EM algorithm was derived to estimate the parameters. Simulation studies and an experiment using real microarray dataset showed the effectiveness of the proposed method.

관측치의 개수보다 변량의 개수가 더 많은 다변수 상황에서 정규혼합모형을 이용하여 군집분석을 하기 위해서는 비정보적인 변수들을 제거하는 과정이 필수적으로 요구된다. 이와 같은 변수선택과 군집의 동시 처리를 위한 기존 연구의 대부분은 군집별 등분산 가정 하에서 이루어져 왔으며, 비정보적인 변수를 제거하기 위해 주로 벌점화 우도 기법이 이용되었다. 본 연구에서는 약간 변형된 정규혼합모형을 기반으로 비현실적인 등분산 가정을 탈피하면서 효율적으로 비정보적인 변수를 제거하는 새로운 방법을 제공한다. 이 모형에 대한 타당성을 설명하였고, 모수 추정을 위한 EM 알고리즘을 유도하였다. 그리고 모의실험 및 실자료 실험을 통해 제안된 방법의 유효성을 보였다.

Keywords

References

Golub, T. R., Slonim, D. K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J. P., Coller, H., Loh, M. L., Downing, J. R., Caligiuri, M. A. and Bloomfield, C. D. (1999). Molecular classification of cancer: Class discovery andclass prediction by gene expression monitoring, Science, 286, 531-537. https://doi.org/10.1126/science.286.5439.531
Kim, S.-G. (2006). Use of factor analyzer normal mixture model with mean pattern modeling on clustering genes, Communications Korean Statistical Society, 13, 113-123. (Korean with English abstract) https://doi.org/10.5351/CKSS.2006.13.1.113
McLachlan, G. J., Bean, R. W. and Jones, B.-T. (2006). A simple implementation of a normal mixture approach to differential gene expression in multiclass microarrays, Bioinformatics, 22, 1608-1615. https://doi.org/10.1093/bioinformatics/btl148
McLachlan, G. J. and Peel, D. (2000). Finite Mixture Models, John Wiley & Sons.
Meng, X.-L. and Rubin, D. (1993). Maximum likelihood estimation via the ECM algorithm: A general framework, Biometrika, 80, 267-278. https://doi.org/10.1093/biomet/80.2.267
Ng, S. K., McLachlan, G. J., Wang, K., Ben-Tovim, L. and Ng, S. W. (2006). A Mixture model with randomeffects components for clustering correlated gene-expression profiles, Bioinformatics, 22, 1745-1752. https://doi.org/10.1093/bioinformatics/btl165
Pan, W. and Shen, X. (2006). Penalized model-based clustering with application to variable selection, Journal of Machine Learning Research, 8, 1145-1164.
Raftery, A. E. and Dean, N. (2006). Variable selection for model-based clustering, Journal of the American Statistical Association, 101, 168-178. https://doi.org/10.1198/016214506000000113
Schwarz, G. (1978). Estimating the dimension of a model, Annals of Statistics, 6, 461-464. https://doi.org/10.1214/aos/1176344136
Wang, S. and Zhu, J. (2008). Variable selection for model-based high-dimensional clustering and its application to microarray data, Bioinformatics, 64, 440-448.
Xie, B., Pan, W. and Shen, X. (2008). Variable selection in penalized model-based clustering via regularization on grouped parameters, Biometrics, 64, 921-930. https://doi.org/10.1111/j.1541-0420.2007.00955.x

Cited by

A Variable Selection Procedure for K-Means Clustering vol.25, pp.3, 2012, https://doi.org/10.5351/KJAS.2012.25.3.471

The Korean Journal of Applied Statistics (응용통계연구)

Variable Selection in Normal Mixture Model Based Clustering under Heteroscedasticity

이분산 상황 하에서 정규혼합모형 기반 군집분석의 변수선택

Abstract

Keywords

References

Cited by

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)