Browse > Article
http://dx.doi.org/10.5351/KJAS.2007.20.1.117

A New Measure of Agreement to Resolve the Two Paradoxes of Cohen's Kappa  

Park, Mi-Hee (Department of Biostatistics, The Catholic University of Korea)
Park, Yong-Gyu (Department of Biostatistics, The Catholic University of Korea)
Publication Information
The Korean Journal of Applied Statistics / v.20, no.1, 2007 , pp. 117-132 More about this Journal
Abstract
In a $2\times2$ table showing binary agreement between two raters, it is known that Cohen's $\kappa$, a chance-corrected measure of agreement, has two paradoxes. $\kappa$ is substantially sensitive to raters' classification probabilities(marginal probabilities) and does not satisfy conditions as a chance-corrected measure of agreement. However, $\kappa$ and other established measures have a reasonable and similar value when each marginal distribution is close to 0.5. The objectives of this paper are to present a new measure of agreement, H, which resolves paradoxes of $\kappa$ by adjusting unbalanced marginal distributions and to compare the proposed measure with established measures through some examples.
Keywords
Measure of agreement; chance expected agreement; paradox; balanced marginal distribution; harmonic mean;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Holley, J. W. and Guilford, J. P. (1964). A note on the G index of agreement, Educational and Psychological Measurement, 24, 749-753   DOI
2 Janson, S. and Vegelius, J. (1979). On generalizations of the G index and the PHI coefficient to nominal scales, Multivariate Behavioral Research, 14, 255-269   DOI   ScienceOn
3 Lantz, C. A. and Nebenzahl, E. (1996). Behavior and interpretation of the K, statistic: Resolution of the two paradoxes, Journal of Clinical Epidemiology, 49, 431-434   DOI   ScienceOn
4 Aickin, M. (1990). Maximum likelihood estimation of agreement in the constant predictive probability model, and its relation to Cohen's kappa, Biometrics, 46, 293-302   DOI   ScienceOn
5 Bennet, E. M., Alpert, R. and Goldstein, A. C. (1954). Communications through limited response questionning, Public Opinion Quarterly, 18, 303-308   DOI   ScienceOn
6 Bishop, Y. M. M., Fienberg, S. E. and Holland, P. W. (1995). Discrete Multivariate Analysis, MIT Press, Cambridge, Mass
7 Byrt, T., Bishop, J. and Carlin, J. B. (1993). Bias, prevalence and kappa, Journal of Clinical Epidemiology, 46, 423-429   DOI   ScienceOn
8 Cicchetti, D. V. and Feinstein, A. R. (1990). High agreement but low kappa: 2. Resolving the paradoxes, Journal of Clinical Epidemiology, 43, 551-558   DOI   ScienceOn
9 Cohen, J. (1960). A coefficient of agreement for nominal scales, Educational and Psychological Measurement, 20, 37-46   DOI
10 Ferger, W. F. (1931). The nature and use of the harmonic mean, Journal of the American Statistical Association, 26, 36-40
11 Gwet, K. (2001). Handbook of inter-rater reliability, STATAXIS Publishing company, Gaithersburg
12 Feinstein, A. R. and Cicchetti, D. V. (1990). High agreement but low kappa: 1. The problems of two paradoxes, Journal of Clinical Epidemiology, 43, 543-549   DOI   ScienceOn
13 Scott, W. A. (1955). Reliability of content analysis: The case of nominal scale coding, Public Opinion Quarterly, 19, 321-325   DOI   ScienceOn
14 Brennan, R. L. and Prediger, D. (1981). Coefficient kappa: Some uses, misuses, and alternatives, Educational and Psychological Measurement, 41, 687-699   DOI
15 Andres, A. M. and Marzo, P. F. (2004). Delta: A new measure of agreement between two raters, The British Journal of Mathematical and Statistical Psychology, 57, 1-19   DOI   ScienceOn