• Title/Summary/Keyword: rater severity

Search Result 7, Processing Time 0.023 seconds

A Study on the Features of Writing Rater in TOPIK Writing Assessment (한국어능력시험(TOPIK) 쓰기 평가의 채점 특성 연구)

  • Ahn, Su-hyun;Kim, Chung-sook
    • Journal of Korean language education
    • /
    • v.28 no.1
    • /
    • pp.173-196
    • /
    • 2017
  • Writing is a subjective and performative activity. Writing ability has multi-facets and compoundness. To understand the examinees's writing ability accurately and provide effective writing scores, raters first ought to have the competency regarding assessment. Therefore, this study is significant as a fundamental research about rater's characteristics on the TOPIK writing assessment. 150 scripts of the 47th TOPIK examinees were selected randomly, and were further rated independently by 20 raters. The many-facet Rasch model was used to generate individualized feedback reports on each rater's relative severity and consistency with respect to particular categories of the rating scale. This study was analyzed using the FACETS ver 3.71.4 program. Overfit and misfit raters showed many difficulties for noticing the difference between assessment factors and interpreting the criteria. Writing raters appear to have much confusion when interpreting the assessment criteria, and especially, overfit and misfit teachers interpret the criteria arbitrarily. The main reason of overfit and misfit is the confusion about assessment factors and criteria in finding basis for scoring. Therefore, there needs to be more training and research is needed for raters based on this type of writing assessment characteristics. This study is recognized significantly in that it collectively examined writing assessment characteristics of writing raters, and visually confirmed the assessment error aspects of writing assessment.

A FACETS Analysis of Rater Characteristics and Rater Bias in Measuring L2 Writing Performance

  • Shin, You-Sun
    • English Language & Literature Teaching
    • /
    • v.16 no.1
    • /
    • pp.123-142
    • /
    • 2009
  • The present study used multi-faceted Rasch measurement to explore the characteristics and bias patterns of non-native raters when they scored L2 writing tasks. Three raters scored 254 writing tasks written by Korean university students on two topics adapted from the TOEFL Test of Written English (TWE). The written products were assessed using a five-category rating scale (Content, Organization, Language in Use, Grammar, and Mechanics). The raters only showed a difference in severity with regard to rating categories but not in task types. Overall, the raters scored Grammar most harshly and Organization most leniently. The results also indicated several bias patterns of ratings with regard to the rating categories and task types. In rater-task bias interactions, each rater showed recurring bias patterns in their rating between two writing tasks. Analysis of rater-category bias interaction showed that the three raters revealed biased patterns across all the rating categories though they were relatively consistent in their rating. The study has implications for the importance of rater training and task selection in L2 writing assessment.

  • PDF

An Analysis on Rater Error in Holistic Scoring for Performance Assessments of Middle School Students' Science Investigation Activities (중학생 과학탐구활동 수행평가 시 총체적 채점에서 나타나는 채점자간 불일치 유형 분석)

  • Kim, Hyung-Jun;Yoo, June-Hee
    • Journal of The Korean Association For Science Education
    • /
    • v.32 no.1
    • /
    • pp.160-181
    • /
    • 2012
  • The purpose of this study is to understand raters' errors in rating performance assessments of science inquiry. For this, 60 middle school students performed scientific inquiry about sound propagation and 4 trained raters rated their activity sheets. Variance components estimation for the result of the generalizability analysis for the person, task, rater design, the variance components for rater, rater by person and rater by task are about 25%. Among 4 raters, 2 raters' severity is higher than the other two raters and their severities were stabilized. Four raters' rating agreed with each other in 51 cases among the 240 cases. Through the raters' conferences, the rater error types for 189 disagreed cases were identified as one of three types; different salience, severity, and overlooking. The error type 1, different salience, showed 38% of the disagreed cases. Salient task and salient assessment components are different among the raters. The error type 2, severity, showed 25% and the error type 3, overlooking showed 31%. The error type 2 seemed to have happened when the students responses were on the borders of two levels. Error type 3 seemed to have happened when raters overlooked some important part of students' responses because she or he immersed her or himself in one's own salience. To reduce the above rater errors, raters' conference in salience of task and assesment components are needed before performing the holistic scoring of complex tasks. Also raters need to recognize her/his severity and efforts to keep one's own severity. Multiple raters are needed to prevent the errors from being overlooked. The further studies in raters' tendencies and sources of different interpretations on the rubric are suggested.

Reliability of the Emergency Severity Index Version 4 Performed by Trained Triage Nurse (중증도 분류 간호사에 의한 응급환자 중증도 분류 신뢰도 측정 연구: Emergency Severity Index Version 4를 중심으로)

  • Choi, Hee Kang;Choi, Min Jin;Kim, Ju Won;Lee, Ji Yeon;Shin, Sun Hwa;Lee, Hyun Jung
    • Journal of Korean Critical Care Nursing
    • /
    • v.5 no.2
    • /
    • pp.61-71
    • /
    • 2012
  • Purpose: The aim of this study was to measure the inter-rater reliability of Emergency severity index (ESI) version 4 among triage nurse. Methods: This study was carried out from August 11, 2010 to September 7, 2010 in a regional emergency department. Data collection was done by ten triage nurses who trained ESI v.4. Two research nurses and ten triage nurses scored the ESI version 4 to the patients as references, independently. We calculated the weighted kappa between the triage nurses and research nurses to evaluate the consistency of the ESI v.4. Results: A total of 233 patients were enrolled in this study. Classification of ESI level was as follows - level 1 (0.4%), level 2 (21.0%), level 3 (67.8%), level 4 (9.4%), and level 5 (1.3%). Inter-rater reliability by weighted kappa was 0.79 (95% Confidence Interval= 0.74-0.83) and agreement rate was 87.1%. Under-triage rate by triage nurse was 6.0% and over-triage rate was 6.9%. Conclusion: For this study, inter-rater reliability was measured good level between triage nurses and research nurses in Korean single ED.

  • PDF

The Validity and Reliability of Reflux Symptom(RSI) Index and Reflux Finding Score(RFS) (역류증상지수와 역류소견점수의 타당성과 신뢰도)

  • Lee, Byung-Joo;Wang, Soo-Geun;Lee, Jin-Choon
    • Journal of the Korean Society of Laryngology, Phoniatrics and Logopedics
    • /
    • v.18 no.2
    • /
    • pp.96-101
    • /
    • 2007
  • Laryngopharyngeal reflux (LPR) is the retrograde movement of gastric contents into the larynx, pharynx, and upper aero-digestive tract. LPR differs from gastroesophageal reflux in that it is often not associated with heartburn and regurgitation symptoms. Otolaryngological manifestations of acid reflux include a wide range of pharyngeal and laryngeal symptoms. Belafsky et al. developed a useful self-administered tool, the reflux symptom index (RSI), for assessing the degree of LPR symptoms. Patients are asked to use a 0 to 5 point scale to grade the following symptoms: 1) hoarseness or voice problems; 2) throat clearing; 3) excess throat mucus or postnasal drip ; 4) difficulty swallowing; 5) coughing after eating or lying down; 6) breathing difficulties ; 7) troublesome or annoying cough; 8) sensation of something sticking or a lump in the throat; 9) heartburn, chest pain, indigestion or stomach acid coming up. A RSI score greater than 13 is considered abnormal. As there is no validated instrument to document the physical findings and severity of LPR, Belafsky et al. developed an eight-item clinical severity scale for judging laryngoscopic finding, the reflux finding score (RFS). They rated eight LPR-associated findings on a scale from 0 to 4 : subglottic edema, ventricular obliteration, erythema/hyperemia, vocal-fold edema, diffuse laryngeal edema, posterior commissure hypertrophy, granuloma/granulation tissue, and thick endolaryngeal mucus. A RFS score of greater than 7 was found to suggest LPR-associated laryngitis. Although both indices (RSI and RFS) are widely used, there is some controversy about their validity (sensitivity and specificity) and reliability (intra-rater and inter-rater) in LPR diagnosis and treatment. We discuss the validity and reliability of RSI and RFS with literature review.

  • PDF

Multiple Average Ratings of Auditory Perceptual Analysis for Dysphonia

  • Choi, Seong-Hee;Choi, Hong-Shik
    • Phonetics and Speech Sciences
    • /
    • v.1 no.4
    • /
    • pp.165-170
    • /
    • 2009
  • This study was to investigate for comparison between single rating and average ratings from multiple presentations of the same stimulus for measuring the voice quality of dysphonia using 7-point equal-appearing interval (EAI) rating scale. Overall severity of voice quality for 46 /a/ vowel stimuli (23 stimuli from dysphonia, 23 stimuli from control) was rated by 3 experienced speech-language pathologists (averaged 19 years; range = 7 to 40 years). For average ratings, each stimulus was rated five times in random order and averaged from two to five times. Although higher inter-rater reliability was found in average ratings than in single rating, there were no significant differences in rating scores between single and multiple average ratings judged by experienced listeners, suggesting that auditory perceptual ratings judged by well-trained listeners have relatively good agreement with the same stimulus across the judgment. Larger variations in perceptual ratings were observed for moderate voices than for mild or severe voices, even in the average ratings.

  • PDF

A Pilot Study of Evaluating the Reliability and Validity of Pattern Identification Tool for Insomnia and Analyzing Correlation with Psychological Tests (불면증 변증도구 신뢰도와 타당도 평가 및 심리검사와의 상관성에 대한 초기연구)

  • Jeong, Jin-Hyung;Lee, Ji-Yoon;Kim, Ju-Yeon;Kim, Si-Yeon;Kang, Wee-Chang;Lim, Jung Hwa;Kim, Bo Kyung;Jung, In Chul
    • Journal of Oriental Neuropsychiatry
    • /
    • v.31 no.1
    • /
    • pp.1-12
    • /
    • 2020
  • Objectives: The purpose of this study was to evaluate the reliability and validity of the instrument on pattern identification for insomnia (PIT-Insomnia) and verify the correlation between PIT-Insomnia and psychological tests. Methods: Two evaluators examined the pattern identification of the participants who met insomnia disorder diagnostic criteria of the Diagnostic and Statistical Manual of Mental Disorder, Fifth Edition (DSM-5) and took the Insomnia Severity Index (ISI) score over 15 once manually and twice using the PIT-Insomnia to measure the inter-rater and test-retest reliability. We also conducted the following surveys: the Pittsburgh Sleep Quality Index (PSQI), the Korean version of Beck's depression inventory (K-BDI), the Korean version of the State-Trait Anxiety Inventory (STAI-K), the Korean Symptom checklist-95 (KSCL-95), and the EuroQol-5 dimension (EQ-5D), to measure concurrent validity and correlation between the PTI-Insomnia and psychological tests. Results: 1. The test-retest reliability analysis of the pattern identification results showed moderate agreement, and test-retest reliability analysis of each pattern identification score showed agreements from poor to moderate. 2. The inter-rater reliability analysis of the pattern identification results via manual showed slight agreement, when analysis was performed with calibration, the inter-rater reliability analysis of the pattern identification results via manual showed fair agreement. 3. The concordance analysis between results via manual and the PIT-Insomnia showed poor agreement, when the analysis was performed with calibration, concordance analysis showed fair agreement. 4. The concordance analysis between the PIT-Insomnia and the PSQI showed positive linear correlation. 5. The concordance analysis between the PIT-Insomnia and the PSQI, K-BDI, STAI-K, KSCL-95, and EQ-5D showed that non-interaction between the heart and kidney have positive linear correlation with the K-BDI, anxiety item of KSCL-95, dual deficiency of the heart-spleen have positive linear correlation with somatization item of KSCL-95, paranoia item of KSCL-95, heart deficiency with timidity have positive linear correlation with stress vulnerability item of KSCL-95, parania item of KSCL-95, phlegm-fire harassing the heart have positive linear correlation with K-BDI, paranoia item of KSCL-95, depressed liver qi transforming into fire have positive linear correlation with the anxiety item of KSCL-95, parania item of KSCL-95, all pattern identification have negative linear correlation with EQ-5D. Conclusions: The PIT-Insomnia has moderate agreement of reliability and reflects the severity of insomnia since it has some concurrent validity with the PSQI. There are some correlations between the PTI-Insomnia with specific psychological tests, so we could suggest it can be used appropriately in the clinical situation.