DOI QR코드

DOI QR Code

The EM algorithm for mixture regression with missing covariates

결측 공변량을 갖는 혼합회귀모형에서의 EM 알고리즘

  • Kim, Hyungmin (Department of Statistics, Sungkyunkwan University) ;
  • Ham, Geonhee (Center for Public Opinion and Quantitative Research, The Asan Institute for Policy Studies) ;
  • Seo, Byungtae (Department of Statistics, Sungkyunkwan University)
  • Received : 2016.08.31
  • Accepted : 2016.10.22
  • Published : 2016.12.31

Abstract

Finite mixtures of regression models provide an effective tool to explore a hidden functional relationship between a response variable and covariates. However, it is common in practice that data are not fully observed due to several reasons. In this paper, we derived an expectation-maximization (EM) algorithm to obtain the maximum likelihood estimator when some covariates are missing at random in the finite mixture of regression models. We conduct some simulation studies and we also provide some real data examples to show the validity of the derived EM algorithm.

혼합회귀모형은 반응 변수와 공변량 사이의 관계를 규명하는 유용한 통계적 모형으로 여러 분야에서 사용되어지고 있다. 하지만 실제로 혼합회귀모형을 이용하여 분석을 하는 과정에서 공변량이 결측값을 포함하는 문제는 흔하게 발생하며, 발생하는 결측의 유형 또한 다양하게 나타난다. 이러한 경우에 있어서 본 논문에서는 최대우도추정량을 구하기 위한 EM 알고리즘을 제안하고자 한다. 제안된 EM 알고리즘의 효용성을 모의실험을 통해 확인하였으며 또한 사례연구를 통해 제시된 방법이 어떻게 사용될수 있는지와 그 효용성을 함께 확인하였다.

Keywords

References

  1. Bandeen, R. K., Miglioretti, D. L., Zeger, S. L., and Rathouz, P. J. (1997). Latent variable regression for multiple discrete outcomes, Journal of the American Statistical Association, 92, 1375-1386. https://doi.org/10.1080/01621459.1997.10473658
  2. Benaglia, T., Chauveau, D., Hunter, D., and Young, D. (2009). mixtools: an R package for analyzing finite mixture models, Journal of Statistical Software, 32, 1-29.
  3. DeSarbo, W. S. and Cron, W. L. (1988). A maximum likelihood methodology for clusterwise linear regression, Journal of Classification, 5, 249-282. https://doi.org/10.1007/BF01897167
  4. Dempster, A. P., Laird, N. M., and Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm, Journal of the Royal Statistical Society, Series B (Methodological), 39, 1-38.
  5. Ingrassia, S., Minotti, S., and Vittadini, G. (2012). Local statistical modeling via the cluster-weighted approach with elliptical distributions, Journal of Classification, 29, 363-401. https://doi.org/10.1007/s00357-012-9114-3
  6. Ingrassia, S., Minotti, S., and Punzo, A. (2014). Model-based clustering via linear cluster-weighted models, Computational Statistics and Data Analysis, 71, 159-182. https://doi.org/10.1016/j.csda.2013.02.012
  7. Hennig, C. (2000). Identifiability of models for clusterwise linear regression, Journal of Classification, 17, 273-296. https://doi.org/10.1007/s003570000022
  8. Jacobs, R. A., Jordan, M. I., Nowlan, S. J., and Hinton, G. E. (1991). Adaptive mixtures of local experts, Neural Computation, 3, 79-87. https://doi.org/10.1162/neco.1991.3.1.79
  9. Leisch, F. (2004). FlexMix: a general framework for finite mixture models and latent glass regression in R, Journal of Statistical Software, 11, 1-18.
  10. Mclachlan, G. J. and Krishnan, T. (1997). The EM Algorithm and Extension, Wiley, New York.
  11. Punzo, A. (2014). Flexible mixture modeling with the polynomial Gaussian cluster-weighted model, Statistical Modelling, 14, 257-291. https://doi.org/10.1177/1471082X13503455
  12. Quandt, R. and Ramsey, J. (1978). Estimating mixtures of normal distributions and switching regressions, Journal of the American Statistical Association, 73, 730-738. https://doi.org/10.1080/01621459.1978.10480085
  13. Redner, R. A. and Walker, H. F. (1984). Mixture densities, maximum likelihood and the EM algorithm, SIAM Review, 26, 195-239. https://doi.org/10.1137/1026034
  14. Subedi, S., Punzo, A., Ingrassia, S., and McNicholas, P. (2013). Clustering and classification via clusterweighted factor analyzers, Advances in Data Analysis and Classification, 7, 5-40. https://doi.org/10.1007/s11634-013-0124-8