A Study on Comparison of Generalized Kappa Statistics in Agreement Analysis

Kim, Min-Seon;Song, Ki-Jun;Nam, Chung-Mo;Jung, In-Kyung;

doi:10.5351/KJAS.2012.25.5.719

The Korean Journal of Applied Statistics (응용통계연구)

Volume 25 Issue 5
/
Pages.719-731
/
2012
/
1225-066X(pISSN)
/
2383-5818(eISSN)

The Korean Statistical Society (한국통계학회)

DOI QR Code

A Study on Comparison of Generalized Kappa Statistics in Agreement Analysis

Kim, Min-Seon (Department of Biostatistics, Yonsei University College of Medicine) ;
Song, Ki-Jun (Department of Biostatistics, Yonsei University College of Medicine) ;
Nam, Chung-Mo (Department of Biostatistics, Yonsei University College of Medicine) ;
Jung, In-Kyung (Department of Biostatistics, Yonsei University College of Medicine)

Received : 2012.06.04
Accepted : 2012.09.19
Published : 2012.10.31

https://doi.org/10.5351/KJAS.2012.25.5.719 Citation PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

Agreement analysis is conducted to assess reliability among rating results performed repeatedly on the same subjects by one or more raters. The kappa statistic is commonly used when rating scales are categorical. The simple and weighted kappa statistics are used to measure the degree of agreement between two raters, and the generalized kappa statistics to measure the degree of agreement among more than two raters. In this paper, we compare the performance of four different generalized kappa statistics proposed by Fleiss (1971), Conger (1980), Randolph (2005), and Gwet (2008a). We also examine how sensitive each of four generalized kappa statistics can be to the marginal probability distribution as to whether marginal balancedness and/or homogeneity hold or not. The performance of the four methods is compared in terms of the relative bias and coverage rate through simulation studies in various scenarios with different numbers of raters, subjects, and categories. A real data example is also presented to illustrate the four methods.

Keywords

References

Berry, K. J. and Mielke, P. W. (1988). A generalization of Cohen's kappa, Educational and Psychological Measurement, 48, 921-933. https://doi.org/10.1177/0013164488484007
Brennan, R. L. and Prediger, D. J. (1981). Coefficient kappa: Some uses, misuses, and alternatives, Educational and Psychological Measurement, 41, 687-699. https://doi.org/10.1177/001316448104100307
Cohen, J. (1960). A coefficient of agreement for nominal scales, Educational and Psychological Measurement, 20, 37-46. https://doi.org/10.1177/001316446002000104
Cohen, J. (1968). Weighted kappa: Nominal scale agreement provision for scaled disagreement of partial credit, Psychological Bulletin, 70, 213-220. https://doi.org/10.1037/h0026256
Conger, A. J. (1980). Integration and generalization of kappas for multiple raters, Psychological Bulletin, 88, 322-328. https://doi.org/10.1037/0033-2909.88.2.322
Feinstein, A. R. and Cicchetti, D. V. (1990). High agreement but low kappa: 1. The problems of two paradoxes, Journal of Clinical Epidemiology, 43, 543-549. https://doi.org/10.1016/0895-4356(90)90158-L
Fleiss, J. L. (1971). Measuring nominal scale agreement among many raters, Psychological Bulletin, 76, 378-382. https://doi.org/10.1037/h0031619
Gwet, K. L. (2008a). Computing inter-rater reliability and its variance in the presence of high agreement, British Journal of Mathematical and Statistical Psychology, 61, 29-48. https://doi.org/10.1348/000711006X126600
Gwet, K. L. (2008b). Variance estimation of nominal-scale interrater reliability with random selection of raters, Psychometrika, 73, 407-430. https://doi.org/10.1007/s11336-007-9054-8
Gwet, K. L. (2010). Handbook of Inter-Rater Reliability, 2nd edn. Advanced Analytics, LLC.
Janson, H. and Olsson, U. (2001). A measure of agreement for interval or nominal multivariate observations, Educational and Psychological Measurement, 61, 277-289. https://doi.org/10.1177/00131640121971239
Janson, H. and Olsson, U. (2004). A measure of agreement for interval or nominal multivariate observations by different sets of judges, Educational and Psychological Measurement, 64, 62-70. https://doi.org/10.1177/0013164403260195
Park, M. H. and Park, Y. G. (2007). A new measure of agreement to resolve the two paradoxes of Cohen's kappa, The Korean Journal of Applied Statistics, 20, 117-132. https://doi.org/10.5351/KJAS.2007.20.1.117
Quenouille, M. H. (1949). Approximate test of correlation in time-series, Journal of the Royal Statistical Society, Series B, (Methodological), 11, 68-84.
Randolph, J. J. (2005). Free-marginal multirater kappa: An alternative to Fleiss' fixed-marginal multirater kappa, Paper presented at the Joensuu University Learning and Instruction Symposium.
Scott, W. (1955). Reliability of content analysis: The case of nominal scale coding, Public Opinion Quarterly, 19, 321-325. https://doi.org/10.1086/266577

Cited by

Development of a scale to measure diabetes self-management behaviors among older Koreans with type 2 diabetes, based on the seven domains identified by the American Association of Diabetes Educators vol.14, pp.2, 2017, https://doi.org/10.1111/jjns.12145
Measurement of Inter-Rater Reliability in Systematic Review vol.35, pp.1, 2015, https://doi.org/10.7599/hmr.2015.35.1.44

The Korean Journal of Applied Statistics (응용통계연구)

A Study on Comparison of Generalized Kappa Statistics in Agreement Analysis

Abstract

Keywords

References

Cited by

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)