Acknowledgement
This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (NRF-2021R1I1A1A01059893).
References
- Park JE, Han K, Sung YS, Chung MS, Koo HJ, Yoon HM, et al. Selection and reporting of statistical methods to assess reliability of a diagnostic test: conformity to recommended methods in a peer-reviewed journal. Korean J Radiol 2017;18:888-897 https://doi.org/10.3348/kjr.2017.18.6.888
- Atzen SL, Bluemke DA. Top 10 tips for writing your scientific paper: the radiology scientific style guide. Radiology 2022;304:1-2 https://doi.org/10.1148/radiol.229005
- Cohen J. A coefficient of agreement for nominal scales. Educ Psychol Meas 1960;20:37-46 https://doi.org/10.1177/001316446002000104
- Fleiss JL. Measuring nominal scale agreement among many raters. Psychol Bull 1971;76:378
- Conger AJ. Integration and generalization of kappas for multiple raters. Psychol Bull 1980;88:322
- Light RJ. Measures of response agreement for qualitative data: some generalizations and alternatives. Psychol Bull 1971;76:365
- Feinstein AR, Cicchetti DV. High agreement but low kappa: I. The problems of two paradoxes. J Clin Epidemiol 1990;43:543-549 https://doi.org/10.1016/0895-4356(90)90158-L
- Gwet KL. Computing inter-rater reliability and its variance in the presence of high agreement. Br J Math Stat Psychol 2008;61(Pt 1):29-48 https://doi.org/10.1348/000711006X126600
- Brennan RL, Prediger DJ. Coefficient kappa: some uses, misuses, and alternatives. Educ Psychol Meas 1981;41:687-699 https://doi.org/10.1177/001316448104100307
- Krippendorff K. Bivariate agreement coefficients for reliability of data. Sociol Methodol 1970;2:139-150 https://doi.org/10.2307/270787
- Cohen J. Weighted kappa: nominal scale agreement with provision for scaled disagreement or partial credit. Psychol Bull 1968;70:213-220 https://doi.org/10.1037/h0026256
- Gwet KL. Handbook of inter-rater reliability. 4th ed. Gaithersburg, MD: Advanced Analytics, LLC, 2014
- Shrout PE, Fleiss JL. Intraclass correlations: uses in assessing rater reliability. Psychol Bull 1979;86:420-428 https://doi.org/10.1037/0033-2909.86.2.420
- Carrasco JL, Phillips BR, Puig-Martinez J, King TS, Chinchilli VM. Estimation of the concordance correlation coefficient for repeated measures using SAS and R. Comput Methods Programs Biomed 2013;109:293-304 https://doi.org/10.1016/j.cmpb.2012.09.002
- Hernaez R. Reliability and agreement studies: a guide for clinical investigators. Gut 2015;64:1018-1027 https://doi.org/10.1136/gutjnl-2014-308619
- Raunig DL, McShane LM, Pennello G, Gatsonis C, Carson PL, Voyvodic JT, et al. Quantitative imaging biomarkers: a review of statistical methods for technical performance assessment. Stat Methods Med Res 2015;24:27-67 https://doi.org/10.1177/0962280214537344
- Jones M, Dobson A, O'Brian S. A graphical method for assessing agreement with the mean between multiple observers using continuous measures. Int J Epidemiol 2011;40:1308-1313 https://doi.org/10.1093/ije/dyr109
- Mitani AA, Freer PE, Nelson KP. Summary measures of agreement and association between many raters' ordinal classifications. Ann Epidemiol 2017;27:677-685.e4 https://doi.org/10.1016/j.annepidem.2017.09.001