DOI QR코드

DOI QR Code

Comparison of model selection criteria in graphical LASSO

그래프 LASSO에서 모형선택기준의 비교

  • 안형석 (서울시립대학교 통계학과) ;
  • 박창이 (서울시립대학교 통계학과)
  • Received : 2014.06.26
  • Accepted : 2014.07.18
  • Published : 2014.07.31

Abstract

Graphical models can be used as an intuitive tool for modeling a complex stochastic system with a large number of variables related each other because the conditional independence between random variables can be visualized as a network. Graphical least absolute shrinkage and selection operator (LASSO) is considered to be effective in avoiding overfitting in the estimation of Gaussian graphical models for high dimensional data. In this paper, we consider the model selection problem in graphical LASSO. Particularly, we compare various model selection criteria via simulations and analyze a real financial data set.

그래프모형(graphical model)은 확률 변수들간의 조건부 독립성(conditional independence)을 시각적인 네트워크형태로 표현할 수 있기 때문에, 정보학 (bioinformatics)이나 사회관계망 (social network) 등 수많은 변수들이 서로 연결되어 있는 복잡한 확률 시스템에 대한 직관적인 도구로 활용될 수 있다. 그래프 LASSO (graphical least absolute shrinkage and selection operator)는 고차원의 자료에 대한 가우스 그래프 모형 (Gaussian graphical model)의 추정에서 과대적합 (overfitting)을 방지하는데에 효과적인 것으로 알려진 방법이다. 본 논문에서는 그래프 LASSO 추정에서 매우 중요한 문제인 모형선택에 대하여 고려한다. 특히 여러가지 모형선택기준을 모의실험을 통해 비교하며 실제 금융 자료를 분석한다.

Keywords

References

  1. Banerjee, O., El Ghaoui, L. and d'Aspremont, A. (2008). Model selection through sparse maximum likelihood estimation for multivariate gaussian or binary data. Journal of Machine Learning Research, 9, 485-516.
  2. Chen, J. and Chen, Z. (2008). Extended Bayesian information criteria for model selection with large model spaces. Biometrika, 95, 759-771. https://doi.org/10.1093/biomet/asn034
  3. Chen, J. and Chen, Z. (2012). Extended BIC for small-n-large-P sparse GLM. Statistica Sinica, 22, 555.
  4. Cho, K.-H. and Park, H.-C. (2012). A study on 3-step complex data mining in society indicator survey. Journal of the Korean Data & Information Science Society, 23, 983-935. https://doi.org/10.7465/jkdi.2012.23.5.983
  5. Choi, I., Kang, D., Lee, J., Kang, M., Song, D., Shin, S. and Son, Y. S. (2012). Prediction of the industrial stock price index using domestic and foreign economic indices. Journal of the Korean Data & Information Science Society, 23, 271-283. https://doi.org/10.7465/jkdi.2012.23.2.271
  6. Dempster, A. P. (1972). Covariance selection. Biometrics, 28, 157-175. https://doi.org/10.2307/2528966
  7. Fan, J., Feng, Y. and Wu, Y. (2009). Network exploration via the adaptive LASSO and SCAD penalties. The Annals of Statistics, 3, 521-541. https://doi.org/10.1214/08-AOAS215
  8. Friedman, J., Hastie, T. and Tibshirani, R. (2008). Sparse inverse covariance estimation with the graphical lasso. Biostatistics, 9, 432-441. https://doi.org/10.1093/biostatistics/kxm045
  9. Guo, J., Levina, E., Michailidis, G. and Zhu, J. (2011). Joint estimation of multiple graphical models. Biometrika, 98, 1-15. https://doi.org/10.1093/biomet/asq060
  10. Hastie, T., Tibshirani, R., and Friedman, J. (2009). Elements of statistical learning, Springer, New York.
  11. Jordan, M. I. (2004). Graphical models. Statistical Science, 19, 140-155. https://doi.org/10.1214/088342304000000026
  12. Lauritzen, S. L. (1996). Graphical models, Clarendon Press, Oxford.
  13. Tibshirani, R. J. (1996). Regression shrinkage and selection via the LASSO. Journal of the Royal Statistical Society B, 58, 267-288.
  14. Tibshirani, R., Saunders, M., Rosset, S., Zhu, J. and Knight, K. (2005). Sparsity and smoothness via the fused lasso. Journal of the Royal Statistical Society B, 67, 91-108. https://doi.org/10.1111/j.1467-9868.2005.00490.x
  15. Yuan, M. and Lin, Y. (2007). Model selection and estimation in the Gaussian graphical model. Biometrika, 94, 19-35. https://doi.org/10.1093/biomet/asm018