DOI QR코드

DOI QR Code

Statistical Methods for the Analysis of Inter-Reader Agreement Among Three or More Readers

  • Kyunghwa Han (Department of Radiology, Research Institute of Radiological Science, and Center for Clinical Imaging Data Science, Yonsei University College of Medicine) ;
  • Leeha Ryu (Department of Biostatistics and Computing, Yonsei University Graduate School)
  • 투고 : 2023.10.01
  • 심사 : 2023.12.05
  • 발행 : 2024.04.01

초록

키워드

과제정보

This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (NRF-2021R1I1A1A01059893).

참고문헌

  1. Park JE, Han K, Sung YS, Chung MS, Koo HJ, Yoon HM, et al. Selection and reporting of statistical methods to assess reliability of a diagnostic test: conformity to recommended methods in a peer-reviewed journal. Korean J Radiol 2017;18:888-897
  2. Atzen SL, Bluemke DA. Top 10 tips for writing your scientific paper: the radiology scientific style guide. Radiology 2022;304:1-2
  3. Cohen J. A coefficient of agreement for nominal scales. Educ Psychol Meas 1960;20:37-46
  4. Fleiss JL. Measuring nominal scale agreement among many raters. Psychol Bull 1971;76:378
  5. Conger AJ. Integration and generalization of kappas for multiple raters. Psychol Bull 1980;88:322
  6. Light RJ. Measures of response agreement for qualitative data: some generalizations and alternatives. Psychol Bull 1971;76:365
  7. Feinstein AR, Cicchetti DV. High agreement but low kappa: I. The problems of two paradoxes. J Clin Epidemiol 1990;43:543-549
  8. Gwet KL. Computing inter-rater reliability and its variance in the presence of high agreement. Br J Math Stat Psychol 2008;61(Pt 1):29-48
  9. Brennan RL, Prediger DJ. Coefficient kappa: some uses, misuses, and alternatives. Educ Psychol Meas 1981;41:687-699
  10. Krippendorff K. Bivariate agreement coefficients for reliability of data. Sociol Methodol 1970;2:139-150
  11. Cohen J. Weighted kappa: nominal scale agreement with provision for scaled disagreement or partial credit. Psychol Bull 1968;70:213-220
  12. Gwet KL. Handbook of inter-rater reliability. 4th ed. Gaithersburg, MD: Advanced Analytics, LLC, 2014
  13. Shrout PE, Fleiss JL. Intraclass correlations: uses in assessing rater reliability. Psychol Bull 1979;86:420-428
  14. Carrasco JL, Phillips BR, Puig-Martinez J, King TS, Chinchilli VM. Estimation of the concordance correlation coefficient for repeated measures using SAS and R. Comput Methods Programs Biomed 2013;109:293-304
  15. Hernaez R. Reliability and agreement studies: a guide for clinical investigators. Gut 2015;64:1018-1027
  16. Raunig DL, McShane LM, Pennello G, Gatsonis C, Carson PL, Voyvodic JT, et al. Quantitative imaging biomarkers: a review of statistical methods for technical performance assessment. Stat Methods Med Res 2015;24:27-67
  17. Jones M, Dobson A, O'Brian S. A graphical method for assessing agreement with the mean between multiple observers using continuous measures. Int J Epidemiol 2011;40:1308-1313
  18. Mitani AA, Freer PE, Nelson KP. Summary measures of agreement and association between many raters' ordinal classifications. Ann Epidemiol 2017;27:677-685.e4