Browse > Article
http://dx.doi.org/10.7465/jkdi.2012.23.1.025

A simulation study of rater agreement measures  

Han, Kyung-Do (Department of Biostatistics, The Catholic University of Korea)
Park, Yong-Gyu (Department of Biostatistics, The Catholic University of Korea)
Publication Information
Journal of the Korean Data and Information Science Society / v.23, no.1, 2012 , pp. 25-37 More about this Journal
Abstract
Many statistics, such as Cohen's (1960) ${\kappa}$, Scott's (1955) ${\pi}$, and Park and Park's (2007) H have been proposed as measures of agreement to represent inter-rater reliability. This study compared bias, SE, MSE, and CV of the measures of agreement with nominal and ordinal categories in the balanced marginal distributions, and those with nominal categories in the two paradoxical situations. As a result, in all cases, AC1and Hhad smaller SE and CV.
Keywords
H; kappa; paradox of kappa; simulation;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Cicchetti, D. V. and Allison, T. (1971). A new procedure for assessing reliability of scoring EEG sleep recordings. American Journal of EEG Technology, 11, 101-109.
2 Cohen, J. (1960). A coefficient of agreement for nominal scales. Education and Psychological Measurement, 20, 37-46.   DOI
3 Cohen, J. (1968). Weighted kappa: Nominal scale agreement with provision for scaled disagreement or partial credit. Psychological Bulletin, 70, 213-220.   DOI
4 Feinstein, A. R. and Cicchetti, D. V. (1990). High agreement but low kappa: 1. The problems of two paradoxes. Journal of Clinical Epidemiology, 43, 543-549.   DOI   ScienceOn
5 Fleiss, J. L. and Cohen, J. (1973). The equivalence of weighted kappa and the intraclass correlation coefficient as measures of reliability. Education and Psychological Measurement, 33, 613-619.   DOI
6 Gwet, K. (2001). Handbook of inter-rater reliability, STATAXIS Publishing company, Gaithersburg.
7 Holley, J. W. and Guilford, J. P. (1964). A note on the G index of agreement. Education and Psychological Measurement, 24, 749-753.   DOI
8 Janson, S. and Vegelius, J. (1979). On generalizations of the G index and the PHI coefficient to nominal scales. Multivariate Behavioral Research, 14, 255-269.   DOI   ScienceOn
9 Scott, W. A. (1955). Reliability of content analysis: The case of nominal scale coding. Public Opinion Quarterly, 19, 321-325.   DOI   ScienceOn
10 권나영, 김진곤, 박용규 (2009). 가중 합치도 $H_w$와 k의 새로운 역설. <응용통계연구>, 22, 1073-1085.
11 김진곤, 박미희, 박용규 (2009). $m{\times}m$ 분할표에서의 합치도 H. <한국통계학회논문집>, 16, 753-762.
12 박미희, 박용규 (2007). COHEN의 합치도의 두 가지 역설을 해결하기 위한 새로운 합치도의 제안. <응용통계연구>, 20, 117-132.
13 Agresti, A. (2002). Categorical data analysis, Wiley, New York.