[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.7465/jkdi.2012.23.1.025

A simulation study of rater agreement measures

Han, Kyung-Do (Department of Biostatistics, The Catholic University of Korea)
Park, Yong-Gyu (Department of Biostatistics, The Catholic University of Korea)

Publication Information

Journal of the Korean Data and Information Science Society / v.23, no.1, 2012 , pp. 25-37 More about this Journal

Abstract

Many statistics, such as Cohen's (1960) ${\kappa}$ , Scott's (1955) ${\pi}$ , and Park and Park's (2007) H have been proposed as measures of agreement to represent inter-rater reliability. This study compared bias, SE, MSE, and CV of the measures of agreement with nominal and ordinal categories in the balanced marginal distributions, and those with nominal categories in the two paradoxical situations. As a result, in all cases, AC1and Hhad smaller SE and CV.

Keywords

H; kappa; paradox of kappa; simulation;

Citations & Related Records

Reference

1	Cicchetti, D. V. and Allison, T. (1971). A new procedure for assessing reliability of scoring EEG sleep recordings. American Journal of EEG Technology, 11, 101-109.
2	Cohen, J. (1960). A coefficient of agreement for nominal scales. Education and Psychological Measurement, 20, 37-46. DOI
3	Cohen, J. (1968). Weighted kappa: Nominal scale agreement with provision for scaled disagreement or partial credit. Psychological Bulletin, 70, 213-220. DOI
4	Feinstein, A. R. and Cicchetti, D. V. (1990). High agreement but low kappa: 1. The problems of two paradoxes. Journal of Clinical Epidemiology, 43, 543-549. DOI ScienceOn
5	Fleiss, J. L. and Cohen, J. (1973). The equivalence of weighted kappa and the intraclass correlation coefficient as measures of reliability. Education and Psychological Measurement, 33, 613-619. DOI
6	Gwet, K. (2001). Handbook of inter-rater reliability, STATAXIS Publishing company, Gaithersburg.
7	Holley, J. W. and Guilford, J. P. (1964). A note on the G index of agreement. Education and Psychological Measurement, 24, 749-753. DOI
8	Janson, S. and Vegelius, J. (1979). On generalizations of the G index and the PHI coefficient to nominal scales. Multivariate Behavioral Research, 14, 255-269. DOI ScienceOn
9	Scott, W. A. (1955). Reliability of content analysis: The case of nominal scale coding. Public Opinion Quarterly, 19, 321-325. DOI ScienceOn
10	권나영, 김진곤, 박용규 (2009). 가중 합치도 $H_w$ 와 k의 새로운 역설. <응용통계연구>, 22, 1073-1085.
11	김진곤, 박미희, 박용규 (2009). $m{\times}m$ 분할표에서의 합치도 H. <한국통계학회논문집>, 16, 753-762.
12	박미희, 박용규 (2007). COHEN의 합치도의 두 가지 역설을 해결하기 위한 새로운 합치도의 제안. <응용통계연구>, 20, 117-132.
13	Agresti, A. (2002). Categorical data analysis, Wiley, New York.

KSCI

A simulation study of rater agreement measures 모의 실험을 이용한 여러 합치도들의 비교

A simulation study of rater agreement measures