DOI QR코드

DOI QR Code

Binary regression model using skewed generalized t distributions

기운 일반화 t 분포를 이용한 이진 데이터 회귀 분석

  • Kim, Mijeong (Department of Statistics, Ewha Womans University)
  • Received : 2017.08.22
  • Accepted : 2017.09.05
  • Published : 2017.10.31

Abstract

We frequently encounter binary data in real life. Logistic, Probit, Cauchit, Complementary log-log models are often used for binary data analysis. In order to analyze binary data, Liu (2004) proposed a Robit model, in which the inverse of cdf of the Student's t distribution is used as a link function. Kim et al. (2008) also proposed a generalized t-link model to make the binary regression model more flexible. The more flexible skewed distributions allow more flexible link functions in generalized linear models. In the sense, we propose a binary data regression model using skewed generalized t distributions introduced in Theodossiou (1998). We implement R code of the proposed models using the glm function included in R base and R sgt package. We also analyze Pima Indian data using the proposed model in R.

이진 데이터는 일상 생활에서 자주 접할 수 있는 데이터이다. 이진 데이터를 회귀 분석하는 방법으로 로지스틱(Logistic), 프로빗(Probit), Cauchit, Complementary log-log 모형이 주로 쓰이는데, 이 방법 이외에도 Liu(2004)가 제시한 t 분포를 이용한 로빗(Robit) 모형, Kim 등 (2008)에서 제시한 일반화 t-link 모형을 이용한 방법 등이 있다. 유연한 분포를 이용하면 유연한 회귀 모형이 가능해지는 점에 착안하여, 이 논문에서는 Theodossiou(1998)에서 제시된 기운 일반화 t 분포 (Skewed Generalized t Distribution)의 이용하여 우도 함수를 최대로 하는 이진 데이터 회귀 모형을 소개한다. 기운 일반화 t 분포를 R glm 함수, R sgt 패키지를 연결하여 이 논문에서 제시한 방법을 R로 분석할 수 있는 방법을 소개하고, 피마 인디언(Pima Indian) 데이터를 분석한다.

Keywords

References

  1. Arellano-Valle, R. B. and Bolfarine, H. (1995). On some characterizations of the t-distribution, Statistics & Probability Letters, 25, 79-85. https://doi.org/10.1016/0167-7152(94)00208-P
  2. Azzalini, A. and Valle, A. D. (1996). The multivariate skew-normal distribution, Biometrika, 83, 715-726. https://doi.org/10.1093/biomet/83.4.715
  3. Chen, M. H., Dey, D. K., and Shao, Q. M. (1999). A new skewed link model for dichotomous quantal response data, Journal of the American Statistical Association, 94, 1172-1186. https://doi.org/10.1080/01621459.1999.10473872
  4. Davis, C. (2015). The Skewed Generalized T Distribution Tree Package Vignette, Available from: https://cran.r-project.org/web/packages/sgt/vignettes/sgt.pdf
  5. Hansen, C., McDonald, J. B., and Newey, W. K. (2010). Instrumental variables estimation with flexible distributions, Journal of Business & Economic Statistics, 28, 13-25. https://doi.org/10.1198/jbes.2009.06161
  6. Kim, S., Chen, M. H., and Dey, D. K. (2008). Flexible generalized t-link models for binary response data, Biometrika, 95, 93-106. https://doi.org/10.1093/biomet/asm079
  7. Koenker, R. (2006). Parametric links for binary response. The Newsletter of the R Project Volume 6/4, October 2006, 32.
  8. Liu, C. (2004). Robit regression: a simple robust alternative to logistic and probit regression. In Applied Bayesian Modeling and Casual Inference from Incomplete-Data Perspectives, 227-238.
  9. McCullagh, P. and Nelder, J. A. (1989). Generalized Linear Models no. 37 in Monograph on Statistics and Applied Probability.
  10. McDonald, J. B. and Newey, W. K. (1988). Partially adaptive estimation of regression models via the generalized t distribution, Econometric Theory, 4, 428-457. https://doi.org/10.1017/S0266466600013384
  11. O'hagan, A., and Leonard, T. (1976). Bayes estimation subject to uncertainty about parameter constraints, Biometrika, 63, 201-203. https://doi.org/10.1093/biomet/63.1.201
  12. Pregibon, D. (1982). Resistant fits for some commonly used logistic models with medical applications, Biometrics, 38, 485-498. https://doi.org/10.2307/2530463
  13. Stukel, T. A. (1988). Generalized logistic models, Journal of the American Statistical Association, 83, 426-431. https://doi.org/10.1080/01621459.1988.10478613
  14. Theodossiou, P. (1998). Financial data and the skewed generalized t distribution, Management Science, 44(12-part-1), 1650-1661. https://doi.org/10.1287/mnsc.44.12.1650
  15. UCI Machine Learning Repository http://archive.ics.uci.edu/ml/index.php
  16. Wood, S. N. (2006) Generalized Additive Models: An Introduction with R, CRC Press, Boca Ranton, FL.
  17. Yates, F. (1955). The use of transformations and maximum likelihood in the analysis of quantal experiments involving two treatments, Biometrika, 42, 382-403.