DOI QR코드

DOI QR Code

Independence tests using coin package in R

coin 패키지를 이용한 독립성 검정

  • Kim, Jinheum (Department of Applied Statistics, University of Suwon) ;
  • Lee, Jung-Dong (Department of Applied Statistics, University of Suwon)
  • 김진흠 (수원대학교 통계정보학과) ;
  • 이정동 (수원대학교 통계정보학과)
  • Received : 2014.06.30
  • Accepted : 2014.08.05
  • Published : 2014.09.30

Abstract

The distribution of a test statistic under a null hypothesis depends on the unknown distribution of the data and thus is unknown as well. Conditional tests replace the unknown null distribution by the conditional null distribution, that is, the distribution of the test statistic given the observed data. This approach is known as permutation tests and was developed by Fisher (Fisher, 1935). Theoretical framework for permutation tests was given by Strasser and Weber(1999). The coin package developed by Hothon et al. (2006, 2008) implements a unified approach for conditional inference via the generic independence test. Because convenient functions for the most prominent problems are available, users will not have to use the extremely flexible procedure. In this article we briefly review the underlying theory from Strasser and Weber (1999) and explain how to transform the data to perform the generic function independence test. Finally it was illustrated with a few real data sets.

검정통계량의 영가설 분포는 모집단 분포에 의존하는데 모집단의 분포를 모를 때 영가설 분포를 검정통계량의 조건부 분포로 대체하여 검정하는 방법을 순열 검정이라고 한다. Strasser와 Weber (1999)는 순열 검정을 통합하는 이론을 마련하였고, Hothorn 등 (2006, 2008)은 그 이론을 R에 내장된 coin 패키지에 구현하였다. coin 패키지에서 조건부 독립성 검정은 총괄적인 형태의 함수인 independence test를 통해서 할 수 있지만 대표적인 독립성 검정은 사용자가 편리하도록 간편한 함수를 별도로 제공하고 있다. 본 논문에서는 Strasser와 Weber (1999)의 순열 검정 방법에 대해 소개하고, coin 패키지에 내장된 15개의 간편 함수에 대해 independence test 함수로 변환하는 절차를 설명하고자 한다. 또한, 정의한 independence test 함수를 써서 실제 자료의 점근 분포와 순열 검정, 정확 검정에 기초한 p-값을 서로 비교하고자 한다.

Keywords

References

  1. Agresti, A. (2002), Categorical data analysis, Second Edition, Wiley, New York.
  2. Ansari, A. R. and Bradley, R. A. (1960). Rank-sum tests for dispersion. The Annals of Mathematical Statistics, 31, 1174-1189. https://doi.org/10.1214/aoms/1177705688
  3. Cochran, W. G. (1954). Some methods for strengthening the common chi-squared tests. Biometrics, 10, 417-451. https://doi.org/10.2307/3001616
  4. Fisher, R. A. (1935). The design of experiments, Oliver and Boyd, Edinburgh.
  5. Fleming, T. R., O'Fallon, J. R., O'Brien, P. C., and Harrington, D. P. (1980). Modified Kolmogorov-Smirnov test procedures with applications to arbitrarily censored data. Biometrics, 36, 607-625. https://doi.org/10.2307/2556114
  6. Fligner, M. A. and Killeen, T. J. (1976). Distribution-free two-sample tests for scale. Journal of the American Statistical Association, 71, 210-213. https://doi.org/10.1080/01621459.1976.10481517
  7. Friedman, M. (1937). The use of ranks to avoid the assumption of normality implicit in the analysis of variance. Journal of the American Statistical Association, 32, 675-701. https://doi.org/10.1080/01621459.1937.10503522
  8. Goodman, L. A. (1979). Simple models for the analysis of association in cross-classifications having ordered categories. Journal of the American Statistical Association, 74, 537-552. https://doi.org/10.1080/01621459.1979.10481650
  9. Hollander, M. and Wolfe, D. A. (1999). Nonparametric statistical methods, Second Edition, Wiley, New York.
  10. Hothorn, T., Kurt Hornik, K., van de Wiel, M. A. and Zeileis, A. (2006). A Lego system for conditional inference. The American Statistician, 60, 257-263. https://doi.org/10.1198/000313006X118430
  11. Hothorn, T., Kurt Hornik, K., van de Wiel, M. A. and Zeileis, A. (2008). Implementing a class of permutation tests: The coin package. Journal of Statistical Software, 28, 1-23.
  12. Kalbfleisch, J. D. and Prentice, R. L. (2002). The satistical analysis of failure time data, Second Edition, Wiley, New York.
  13. Kruskal, W. H. (1952). A nonparametric test for the several sample problem. The Annals of Mathematical Statistics, 23, 525-540. https://doi.org/10.1214/aoms/1177729332
  14. Kruskal, W. H. and Wallis, W. A. (1952). Use of ranks in one-criterion variance analysis. Journal of the American Statistical Association, 47, 583-621. https://doi.org/10.1080/01621459.1952.10483441
  15. Mann, H. B. and Whitney, D. R. (1947). On a test of whether one of two random variables is stochastically larger than the other. The Annals of Mathematical Statistics, 18, 50-60. https://doi.org/10.1214/aoms/1177730491
  16. Mantel, N. and Haenszel, W. (1959). Statistical aspects of the analysis of data from the retrospective analysis of disease. Journal of the National Cancer Institute, 22, 719-748.
  17. Maxwell, A. E. (1970). Comparing the classification of subjects by two independent judges. British Journal of Psychiatry, 116, 651-655. https://doi.org/10.1192/bjp.116.535.651
  18. Mood, A. M. (1950). Introduction to the theory of statistics, McGraw-Hill, New York.
  19. Muller, J. and Hothorn, T. (2004). Maximally selected two-sample statistics as a new tool for the identification and assessment of habitat factors with an application to breeding bird communities in oak forests. European Journal of Forest Research, 123, 219-228. https://doi.org/10.1007/s10342-004-0035-5
  20. Pearson, K. (1922). On the chi-square test of goodness of fit. Biometrika, 14, 186-191.
  21. Spearman, C. (1904). The proof and measurement of association between two things. The American Journal of Psychology, 15, 72-101. https://doi.org/10.2307/1412159
  22. Strasser, H. and Weber, C. (1999). On the asymptotic theory of permutation statistics. Mathematical Methods of Statistics, 8, 220-250.
  23. Stuart, A. A. (1955). A test for homogeneity of the marginal distributions in a two-way classification. Biometrika, 42, 412-416. https://doi.org/10.1093/biomet/42.3-4.412
  24. Van der Waerden, B. L. (1952). Order tests for two-sample problem and their power 1. Indagationes Mathematicae, 14, 453-458.
  25. Van der Waerden, B. L. (1953a). Order tests for two-sample problem and their power 2. Indagationes Mathematicae, 15, 303-310.
  26. Van der Waerden, B. L. (1953b). Order tests for two-sample problem and their power 3. Indagationes Mathematicae, 15, 311-316.
  27. Westenberg, J. (1948). Significance test for median and interquartile range in samples from continuous populations of any form. Proceedings of the Koninklijke Nederlandse Akademie van Wetenschappen, 51, 252-261.
  28. Wilcoxon, F. (1945). Individual comparisons by ranking methods. Biometrics Bulletin, 1, 80-83. https://doi.org/10.2307/3001968

Cited by

  1. Permutation p-values for specific-category kappa measure of agreement vol.27, pp.4, 2016, https://doi.org/10.7465/jkdi.2016.27.4.899