DOI QR코드

DOI QR Code

Representing variables in the latent space

분석변수들의 잠재공간 표현

  • Received : 2017.05.08
  • Accepted : 2017.06.10
  • Published : 2017.08.31

Abstract

For multivariate datasets with large number of variables, classical dimensional reduction methods such as principal component analysis may not be effective for data visualization. The underlying reason is that the dimensionality of the space of variables is often larger than two or three, while the visualization to the human eye is most effective with two or three dimensions. This paper proposes a working procedure which first partitions the variables into several "latent" clusters, explores individual data subsets, and finally integrates findings. We use R pakacage "ClustOfVar" for partitioning variables around latent dimensions and the principal component biplot method to visualize within-cluster patterns. Additionally, we use the technique for embedding supplementary variables to figure out the relationships between within-cluster variables and outside variables.

다변량 자료에서 변수 수 p가 큰 경우 주성분분석 등 통상적인 차원축소는 효과적이지 못할 수 있다. 효과적인 시각화가 되려면 축소공간의 차원이 2-3 정도이어야 하는데, 관측개체의 잠재적 차원이 이보다 훨씬 큰 경우가 있기 때문이다. 이 논문은 분석변수들을 다수의 잠재 차원에 분할하여 차원축소적 방법으로 탐색하고 부분들의 유기적 관계를 시각화하는 이단계 작업을 제안한다. 분석변수들을 잠재 차원에 분할하는 "잠재변인 변수군집화" 방법으로는 R팩키지 ClustOfVar를 쓰고 개별 변수군집의 시각화를 위해서 주성분분석 행렬도(biplot)를, 개별 변수군집과 외부 잠재변인 또는 외적 변수 간 관계의 시각화를 위해서는 추가변수 끼워넣기(embedding supplementary variables) 기법을 활용한다.

Keywords

References

  1. Benzecri, J. P. (1992). Correspondence Analysis Handbook, Marcel Dekker, New York.
  2. Chavent, M., Kuentz-Simonet, V., Liquet, B., and Saracco, J. (2012). ClustOfVar: an R package for the clustering of variables, Journal of Statistical Software, 50, 1-16.
  3. Chavent, M., Kuentz, V., Liquet, B., and Saracco, J. (2013). Package 'ClustOfVar'. R Foundation for Statistical Computing, URL https://cran.r-project.org/mirrors.html.
  4. Gabriel, K. R. (1971). The biplot graphic display of matrices with application to principal component analysis, Biometrika, 58, 453-467. https://doi.org/10.1093/biomet/58.3.453
  5. Vigneau, E. and Chen, M. (2015). Package 'ClustVarLV'. R Foundation for Statistical Computing, from: https://cran.r-project.org/mirrors.html.
  6. Vigneau, E., Chen, M., and Qannari, E. M. (2015). ClustVarLV: an R package for the clustering of variables around latent variables, The R Journal, 7, 134-148.
  7. Vigneau, E. and Quannari, E. M. (2003). Clustering of variables around latent components, Communications in Statistics - Simulation and Computation, 32, 1131-1150. https://doi.org/10.1081/SAC-120023882