DOI QR코드

DOI QR Code

Statistical Methods for the Analysis of Inter-Reader Agreement Among Three or More Readers

  • Kyunghwa Han (Department of Radiology, Research Institute of Radiological Science, and Center for Clinical Imaging Data Science, Yonsei University College of Medicine) ;
  • Leeha Ryu (Department of Biostatistics and Computing, Yonsei University Graduate School)
  • Received : 2023.10.01
  • Accepted : 2023.12.05
  • Published : 2024.04.01

Abstract

Keywords

Acknowledgement

This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (NRF-2021R1I1A1A01059893).

References

  1. Park JE, Han K, Sung YS, Chung MS, Koo HJ, Yoon HM, et al. Selection and reporting of statistical methods to assess reliability of a diagnostic test: conformity to recommended methods in a peer-reviewed journal. Korean J Radiol 2017;18:888-897 https://doi.org/10.3348/kjr.2017.18.6.888
  2. Atzen SL, Bluemke DA. Top 10 tips for writing your scientific paper: the radiology scientific style guide. Radiology 2022;304:1-2 https://doi.org/10.1148/radiol.229005
  3. Cohen J. A coefficient of agreement for nominal scales. Educ Psychol Meas 1960;20:37-46 https://doi.org/10.1177/001316446002000104
  4. Fleiss JL. Measuring nominal scale agreement among many raters. Psychol Bull 1971;76:378
  5. Conger AJ. Integration and generalization of kappas for multiple raters. Psychol Bull 1980;88:322
  6. Light RJ. Measures of response agreement for qualitative data: some generalizations and alternatives. Psychol Bull 1971;76:365
  7. Feinstein AR, Cicchetti DV. High agreement but low kappa: I. The problems of two paradoxes. J Clin Epidemiol 1990;43:543-549 https://doi.org/10.1016/0895-4356(90)90158-L
  8. Gwet KL. Computing inter-rater reliability and its variance in the presence of high agreement. Br J Math Stat Psychol 2008;61(Pt 1):29-48 https://doi.org/10.1348/000711006X126600
  9. Brennan RL, Prediger DJ. Coefficient kappa: some uses, misuses, and alternatives. Educ Psychol Meas 1981;41:687-699 https://doi.org/10.1177/001316448104100307
  10. Krippendorff K. Bivariate agreement coefficients for reliability of data. Sociol Methodol 1970;2:139-150 https://doi.org/10.2307/270787
  11. Cohen J. Weighted kappa: nominal scale agreement with provision for scaled disagreement or partial credit. Psychol Bull 1968;70:213-220 https://doi.org/10.1037/h0026256
  12. Gwet KL. Handbook of inter-rater reliability. 4th ed. Gaithersburg, MD: Advanced Analytics, LLC, 2014
  13. Shrout PE, Fleiss JL. Intraclass correlations: uses in assessing rater reliability. Psychol Bull 1979;86:420-428 https://doi.org/10.1037/0033-2909.86.2.420
  14. Carrasco JL, Phillips BR, Puig-Martinez J, King TS, Chinchilli VM. Estimation of the concordance correlation coefficient for repeated measures using SAS and R. Comput Methods Programs Biomed 2013;109:293-304 https://doi.org/10.1016/j.cmpb.2012.09.002
  15. Hernaez R. Reliability and agreement studies: a guide for clinical investigators. Gut 2015;64:1018-1027 https://doi.org/10.1136/gutjnl-2014-308619
  16. Raunig DL, McShane LM, Pennello G, Gatsonis C, Carson PL, Voyvodic JT, et al. Quantitative imaging biomarkers: a review of statistical methods for technical performance assessment. Stat Methods Med Res 2015;24:27-67 https://doi.org/10.1177/0962280214537344
  17. Jones M, Dobson A, O'Brian S. A graphical method for assessing agreement with the mean between multiple observers using continuous measures. Int J Epidemiol 2011;40:1308-1313 https://doi.org/10.1093/ije/dyr109
  18. Mitani AA, Freer PE, Nelson KP. Summary measures of agreement and association between many raters' ordinal classifications. Ann Epidemiol 2017;27:677-685.e4 https://doi.org/10.1016/j.annepidem.2017.09.001