• Title/Summary/Keyword: rater variability

Search Result 5, Processing Time 0.021 seconds

A FACETS Analysis of Rater Characteristics and Rater Bias in Measuring L2 Writing Performance

  • Shin, You-Sun
    • English Language & Literature Teaching
    • /
    • v.16 no.1
    • /
    • pp.123-142
    • /
    • 2009
  • The present study used multi-faceted Rasch measurement to explore the characteristics and bias patterns of non-native raters when they scored L2 writing tasks. Three raters scored 254 writing tasks written by Korean university students on two topics adapted from the TOEFL Test of Written English (TWE). The written products were assessed using a five-category rating scale (Content, Organization, Language in Use, Grammar, and Mechanics). The raters only showed a difference in severity with regard to rating categories but not in task types. Overall, the raters scored Grammar most harshly and Organization most leniently. The results also indicated several bias patterns of ratings with regard to the rating categories and task types. In rater-task bias interactions, each rater showed recurring bias patterns in their rating between two writing tasks. Analysis of rater-category bias interaction showed that the three raters revealed biased patterns across all the rating categories though they were relatively consistent in their rating. The study has implications for the importance of rater training and task selection in L2 writing assessment.

  • PDF

Analysis of Visual Sensibility Evaluation of Naturally Colored Organic Cotton: Identification of Reliability and Proper Scouring Method

  • Park, Jang-Woon;Chang, Yoon;Hong, Won-Gi;Lee, Myung-Eun;Han, Ah-Reum;Chae, Young-Joo;Cho, Gil-Soo;You, Hee-Cheon
    • Journal of the Ergonomics Society of Korea
    • /
    • v.30 no.2
    • /
    • pp.311-317
    • /
    • 2011
  • Objective: The present study was intended to identify (1) the intra- and inter-rater reliabilities of a visual sensibility evaluation protocol and (2) the effects of NaCOC color and scouring method on the visual sensibility of NaCOC. Thirty female participants(20s & 30s) were participated in the visual sensibility evaluation of NaCOC. Background: Interests in naturally colored organic cotton(NaCOC) increase rapidly in parallel with the social trend of eco-friendly living and wellbeing. Method: Three color sets (ivory, green, and coyote-brown) of NaCOC specimens including one untreated and four treated specimens($Na_2CO_3$; NaOH; enzyme; boiling water) were examined in the study. The visual sensibility evaluation was conducted by the test-retest method using nine pairs of bipolar visual sensibility adjectives(bright-dark; clear-murky; heavy-light; vivid-subdued; warm-cool; fresh-stale; strong-weak; showy-plain; and luxurious-cheap). Results: As a result of reliability of a visual sensibility evaluation protocol, inter-rater variability(average SD=1.06) of visual sensibility evaluation was more than 1.4 times the intra-rater variability(average SD=0.74). However, both the sensibility evaluation reliabilities did not show any systematic pattern of changes. Lastly, ANOVA and post-hoc analysis showed that preferred scouring methods for a visual sensibility adjective pair significantly vary depending on NaCOC color. Application: Both the reliability of visual sensibility evaluation protocol and the analysis of proper scoring method of NaCOC in the study would be useful information to design the affective textile.

A Novel Fundus Image Reading Tool for Efficient Generation of a Multi-dimensional Categorical Image Database for Machine Learning Algorithm Training

  • Park, Sang Jun;Shin, Joo Young;Kim, Sangkeun;Son, Jaemin;Jung, Kyu-Hwan;Park, Kyu Hyung
    • Journal of Korean Medical Science
    • /
    • v.33 no.43
    • /
    • pp.239.1-239.12
    • /
    • 2018
  • Background: We described a novel multi-step retinal fundus image reading system for providing high-quality large data for machine learning algorithms, and assessed the grader variability in the large-scale dataset generated with this system. Methods: A 5-step retinal fundus image reading tool was developed that rates image quality, presence of abnormality, findings with location information, diagnoses, and clinical significance. Each image was evaluated by 3 different graders. Agreements among graders for each decision were evaluated. Results: The 234,242 readings of 79,458 images were collected from 55 licensed ophthalmologists during 6 months. The 34,364 images were graded as abnormal by at-least one rater. Of these, all three raters agreed in 46.6% in abnormality, while 69.9% of the images were rated as abnormal by two or more raters. Agreement rate of at-least two raters on a certain finding was 26.7%-65.2%, and complete agreement rate of all-three raters was 5.7%-43.3%. As for diagnoses, agreement of at-least two raters was 35.6%-65.6%, and complete agreement rate was 11.0%-40.0%. Agreement of findings and diagnoses were higher when restricted to images with prior complete agreement on abnormality. Retinal/glaucoma specialists showed higher agreements on findings and diagnoses of their corresponding subspecialties. Conclusion: This novel reading tool for retinal fundus images generated a large-scale dataset with high level of information, which can be utilized in future development of machine learning-based algorithms for automated identification of abnormal conditions and clinical decision supporting system. These results emphasize the importance of addressing grader variability in algorithm developments.

Interpretation of Complete Tumor Response on MRI Following Chemoradiotherapy of Rectal Cancer: Inter-Reader Agreement and Associated Factors in Multi-Center Clinical Practice

  • Hae Young Kim;Seung Hyun Cho;Jong Keon Jang;Bohyun Kim;Chul-min Lee;Joon Seok Lim;Sung Kyoung Moon;Soon Nam Oh;Nieun Seo;Seong Ho Park
    • Korean Journal of Radiology
    • /
    • v.25 no.4
    • /
    • pp.351-362
    • /
    • 2024
  • Objective: To measure inter-reader agreement and identify associated factors in interpreting complete response (CR) on magnetic resonance imaging (MRI) following chemoradiotherapy (CRT) for rectal cancer. Materials and Methods: This retrospective study involved 10 readers from seven hospitals with experience of 80-10210 cases, and 149 patients who underwent surgery after CRT for rectal cancer. Using MRI-based tumor regression grading (mrTRG) and methods employed in daily practice, the readers independently assessed mrTRG, CR on T2-weighted images (T2WI) denoted as mrCRT2W, and CR on all images including diffusion-weighted images (DWI) denoted as mrCRoverall. The readers described their interpretation patterns and how they utilized DWI. Inter-reader agreement was measured using multi-rater kappa, and associated factors were analyzed using multivariable regression. Correlation between sensitivity and specificity of each reader was analyzed using Spearman coefficient. Results: The mrCRT2W and mrCRoverall rates varied widely among the readers, ranging 18.8%-40.3% and 18.1%-34.9%, respectively. Nine readers used DWI as a supplement sequence, which modified interpretations on T2WI in 2.7% of cases (36/1341 [149 patients × 9 readers]) and mostly (33/36) changed mrCRT2W to non-mrCRoverall. The kappa values for mrTRG, mrCRT2W, and mrCRoverall were 0.56 (95% confidence interval: 0.49, 0.62), 0.55 (0.52, 0.57), and 0.54 (0.51, 0.57), respectively. No use of rectal gel, larger initial tumor size, and higher initial cT stage exhibited significant association with a higher interreader agreement for assessing mrCRoverall (P ≤ 0.042). Strong negative correlations were observed between the sensitivity and specificity of individual readers (coefficient, -0.718 to -0.963; P ≤ 0.019). Conclusion: Inter-reader agreement was moderate for assessing CR on post-CRT MRI. Readers' varying standards on MRI interpretation (i.e., threshold effect), along with the use of rectal gel, initial tumor size, and initial cT stage, were significant factors associated with inter-reader agreement.

Evaluation of Moxibustion -related Health Information on Korean Internet News Coverage (한국 인터넷 신문기사에서 뜸 관련 의료정보 평가)

  • Kang, O-Seok;Park, Hi-Joon;Kim, Song-Yi;Lee, Hye-Jung;Chae, Youn-Byoung
    • Journal of Acupuncture Research
    • /
    • v.26 no.2
    • /
    • pp.189-199
    • /
    • 2009
  • Objectives : Despite the substantial amount of newspaper coverage related to complementary and alternative medicine on the internet in recent years, little is know about the quality of the health information. To evaluate health information of the newspaper articles on moxibustion available in Korea through Korean language search engines and Web sites. Methods : We searched 454 news stories about moxibustion between 1 January 2006 and 31 December 2008. Among them, 34 internet news coverage related to health information on moxibustion were selected and rated against eight categories by two individuals. Results : The inter-rater reliability of the two reviewers was assessed as 0.69, indicating a moderately high level of agreement. The overall rating score for 34 articles was 35.7${\pm}$17.2 for the statement criteria and 12.9${\pm}$17.0 for the satisfaction criteria. Overall, although they might be improving recently, the scores remain generally low. There was a statistically significant difference in the score of the newspaper coverage according to the type of evidence sources, including anecdotes. Conclusions : There is substantial variability in news reporting practices about moxibustion. It is urgently needed to enhance the information related to CAM including moxibustion through the news media in Korea.

  • PDF