Analysis of Evaluator Reliability for the Raters' Calibration Training

채점자 조정(calibration) 교육 제안을 위한 평가자 신뢰도 분석

  • Kim, jooah (Department of Dental Education Yonsei University College of Dentistry) ;
  • Shin, Yooseok (Department of Conservative Dentistry and Oral Science Research Center Yonsei University College of Dentistry) ;
  • Seo, Jeong Taeg (Department of Dental Education and Department of Oral Biology Yonsei University College of Dentistry)
  • 김주아 (연세대학교 치과대학 치의학교육학교실) ;
  • 신유석 (연세대학교 치과대학 보존과학교실, 구강과학연구소) ;
  • 서정택 (연세대학교 치과대학 치의학교육학교실, 구강생물학교실)
  • Received : 2020.02.05
  • Accepted : 2020.02.17
  • Published : 2020.04.30

Abstract

This study analyzed the change in the rater reliability based on the student's practice evaluation process conducted at Yonsei University College of Dentistry. Through this, we suggest the significance of the rater calibration training in the student's practical evaluation of dental college. Nine professors from the department of Conservative Dentistry, Yonsei University College of Dentistry, analyzed the results of class II restoration cases twice in 2017 and once in 2018. Intra Class Correlation (ICC) which is a statistic used to determine the consistency of raters with three or more scores, was also calculated. ICC values increased as raters participated in rater calibration meetings and grading experiences. This shows that the rater reliability is related to the grading experience and feedback from calibration meeting. Based on the results of previous studies that grading experiences and rater calibration training can cause a meaningful change in rater behavior, we propose to conduct rater calibration training to ensure the evaluator reliability.

Keywords

References

  1. Introduction of the National Practical Examination for the Dentists Press Release(2017). Ministry of Health and Welfare.
  2. Cicchetti D. Guidelines, criteria, and rules of thumb for evaluating normed and standardized assessment instruments in psychology. Psychological Assessment 1994;6(4):284-290. https://doi.org/10.1037/1040-3590.6.4.284
  3. 김기열. 치의학 연구에서 반복 계측한 자료의 일치도 평가방법. 대한치과의사협회지 2016;54(11):880-896.
  4. 공경애. 검사법 평가: 검사법 비교와 신뢰도 평가. Ewha Med J 2017;40(1):9-16. https://doi.org/10.12771/emj.2017.40.1.9
  5. 박창언, 김현정. 체계적 문헌고찰에서 평가자 간의 신뢰도 측정. Hanyang Med Rev 2015;35:44-49. https://doi.org/10.7599/hmr.2015.35.1.44
  6. Fleiss J. Design and analysis of clinical experiments. New York, USA: Wiley; 1986.
  7. Hallgren K. Computing Inter-Rater Reliability for Observational Data: An Overview Tutor and Tutorial. Tutor Quant Methods Psychol 2012;8(1):23-34. https://doi.org/10.20982/tqmp.08.1.p023
  8. Choi YH. Rating Performance of EFL Teachers in Writing assessment: Comparison of Experienced and Novice Raters. 교과교육학연구 2013;17(1):199-215. https://doi.org/10.24231/RICI.2013.17.1.199
  9. Cumming A, Kantor R, Powers D. Decision making while rating ESL/EFL writing tasks: A descriptive framework. Modern Language Journal 2002;86:67-96. https://doi.org/10.1111/1540-4781.00137
  10. Barrett S. The impact of training on rater variability. International Education Journal 2001;2:49-58.
  11. Schoonen R. Generalizability of writing scores: An application of structural equation modeling. Language Testing 2005;22:1-30. https://doi.org/10.1191/0265532205lt295oa