Browse > Article
http://dx.doi.org/10.7465/jkdi.2016.27.4.899

Permutation p-values for specific-category kappa measure of agreement  

Um, Yonghwan (Division of Industrial and Management Engineering, Sungkyul University)
Publication Information
Journal of the Korean Data and Information Science Society / v.27, no.4, 2016 , pp. 899-910 More about this Journal
Abstract
Asymptotic tests are often not suitable for the analysis of sparse ordered contingency tables as asymptotic p-values may either overestimate or underestimate the true pvalues. In this pater, we describe permutation procedures in which we compute exact or resampling p-values for a weighted specific-category agreement in ordered $k{\times}k$ contingency tables. We use the weighted specific-category kappa proposed by $Kv{\dot{a}}lseth$ to measure the extent to which two independent raters agree on the specific categories. We carried out comparison studies between exact p-values, resampling p-values and asymptotic p-values using $3{\times}3$ contingency data (real and artificial data sets) and $4{\times}4$ artificial contingency data.
Keywords
Contingency tables; permutation; p-values; weighted specific category agreement;
Citations & Related Records
Times Cited By KSCI : 2  (Citation Analysis)
연도 인용수 순위
1 Agresti, A. (2002). Categorical data anaysis, 2nd Ed., Wiley, New York.
2 Berry, K. J., Johnston, J. E. and Mielke, P. W. (2006). Exact and resampling probability values for measures associated with ordered R by C contingency tables. Psychological Reports, 99, 231-238.   DOI
3 Cicchetti, D. V. and Allison, T. (1971). A new procedure for assessing reliability of scoring EEG sleep redordings. The American Journal of EEG Technology, 11, 101-109.   DOI
4 Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20, 37-46.   DOI
5 Cohen, J. (1968). Weighted kappa: Nominal scale agreement with provision for scaled disagreement or partial credit. Psychological Bulletin, 70, 213-220.   DOI
6 Feinstein, A. R. and Cicchetti, D. V. (1990). High agreement but low kappa: I. The problems of two paradoxes. Journal of Clinical Epidemiology, 43, 543-549.   DOI
7 Fisher R. A. (1935). A design of experiments, Oliver & Boyd, Edinburgh.
8 Fleiss, J. L. (1981). Statistical methods for rates and proportions, 2nd Ed., Wiley, New York.
9 Fleiss, J. L. and Cohen, J. (1973). The equivalence of weighted kappa and the intraclass correlation coefficient as measures of reliability. Educational and Psychological Measurement, 2, 113-117.
10 Good, P. I. (2000). Permutation tests : A practical guide to resampling to resampling methods for testing hypotheses, 2nd Ed., springer-Verlag, New York.
11 Good, P. I. (2001). Resampling methods : A practical guide to data analysis, 2nd Ed., Birkhauser, Massachusetts.
12 Han, K. D. and Park Y. G. (2012). A simulation study of rater agreement measures. Journal of the Korean Data & Information Science Society, 23, 25-37.   DOI
13 Holms, C. B. (1979). Sample size in psychological research. Perceptual and Motor Skills, 49, 283-288.   DOI
14 Kim, J. and Lee, J. D. (2014). Independence tests using coin package in R. Journal of the Korean Data & Information Science Society, 25, 1039-1055.   DOI
15 Holms, C. B. (1990). The honest truth about lying with statistics, Thomas Springfield, Illinois.
16 Johnston, J. E., Berry, K. J. and Mielke, P. W. (2007). Permutation tests: Precision in estimating probability values. Perceptual and Motor Skills, 105, 915-920.   DOI
17 Johnston, J. E., Berry, K. J. and Mielke, P. W. (2008). Resampling permutation probability values for weighted kappa. Psychological Reports, 103, 467-475.   DOI
18 Kraemer, H. C. (1983). Kappa coefficient. In Encyclopedia of Statistical Sciences 4, Wiley, New York, 352-354.
19 Kvalseth, T. O. (1989). Note on Cohen's kappa. Psychological Reports, 65, 223-226.   DOI
20 Kvalseth, T. O. (2003). Weighted specific-category kappa measure of interobserver agreement. Psychological Reports, 93, 1283-1290.   DOI
21 Mielke, P. W. and Berry, K. J. (2001). Permutation methods : A distance function approach. 2001, Springer-Verlag, New York.
22 Oleckno, W. A. (2008). Epidemiology : Concepts and methods, Waveland Press, Inc., Illinois.
23 Patefield, W. M. (1981). Algorithm AS 159: An efficient method of generating random R ${\time}$ C tables with given row and column totals. Journal of the Royal Statistical Society C, 30, 91-97.
24 Shoukri, M. M. (2004). Measures of intererobserver agreement, CRC Press, Florida.
25 Zhao, X. (2011). When to use Cohens K, if ever? International Communication Association 2011 Conference.
26 Spitzer, R. L., Cohen, J., Fleiss, J. L. and Endicott, J. (1967). Quantization ofagreement in psychiatric diagnosis. Archives of General Psychiatry, 17, 83-87.   DOI
27 Upton, G. and Cook, I. (2002). Oxford dictionary of statistics, Oxford University Press, United Kingdom.