• Title/Summary/Keyword: rater consistency

Search Result 37, Processing Time 0.025 seconds

A Study on the Features of Writing Rater in TOPIK Writing Assessment (한국어능력시험(TOPIK) 쓰기 평가의 채점 특성 연구)

  • Ahn, Su-hyun;Kim, Chung-sook
    • Journal of Korean language education
    • /
    • v.28 no.1
    • /
    • pp.173-196
    • /
    • 2017
  • Writing is a subjective and performative activity. Writing ability has multi-facets and compoundness. To understand the examinees's writing ability accurately and provide effective writing scores, raters first ought to have the competency regarding assessment. Therefore, this study is significant as a fundamental research about rater's characteristics on the TOPIK writing assessment. 150 scripts of the 47th TOPIK examinees were selected randomly, and were further rated independently by 20 raters. The many-facet Rasch model was used to generate individualized feedback reports on each rater's relative severity and consistency with respect to particular categories of the rating scale. This study was analyzed using the FACETS ver 3.71.4 program. Overfit and misfit raters showed many difficulties for noticing the difference between assessment factors and interpreting the criteria. Writing raters appear to have much confusion when interpreting the assessment criteria, and especially, overfit and misfit teachers interpret the criteria arbitrarily. The main reason of overfit and misfit is the confusion about assessment factors and criteria in finding basis for scoring. Therefore, there needs to be more training and research is needed for raters based on this type of writing assessment characteristics. This study is recognized significantly in that it collectively examined writing assessment characteristics of writing raters, and visually confirmed the assessment error aspects of writing assessment.

Analysis of Evaluator Reliability for the Raters' Calibration Training (채점자 조정(calibration) 교육 제안을 위한 평가자 신뢰도 분석)

  • Kim, jooah;Shin, Yooseok;Seo, Jeong Taeg
    • The Journal of the Korean dental association
    • /
    • v.58 no.5
    • /
    • pp.284-291
    • /
    • 2020
  • This study analyzed the change in the rater reliability based on the student's practice evaluation process conducted at Yonsei University College of Dentistry. Through this, we suggest the significance of the rater calibration training in the student's practical evaluation of dental college. Nine professors from the department of Conservative Dentistry, Yonsei University College of Dentistry, analyzed the results of class II restoration cases twice in 2017 and once in 2018. Intra Class Correlation (ICC) which is a statistic used to determine the consistency of raters with three or more scores, was also calculated. ICC values increased as raters participated in rater calibration meetings and grading experiences. This shows that the rater reliability is related to the grading experience and feedback from calibration meeting. Based on the results of previous studies that grading experiences and rater calibration training can cause a meaningful change in rater behavior, we propose to conduct rater calibration training to ensure the evaluator reliability.

  • PDF

Reliability of the Joint Neutral Position and Measurement Methods of the Ankle Joint Complex Range of Motion (발목관절 복합체의 가동범위 측정을 위한 중립위치와 측정방법의 신뢰도)

  • Hong, Wan-Sung;Kim, Gi-Won
    • The Journal of Korean Physical Therapy
    • /
    • v.23 no.4
    • /
    • pp.45-51
    • /
    • 2011
  • Purpose: To determine the correct measurement methods of the ankle joint complex range of motion for measuring the neutral position and evaluate the rater reliability. In addition, the impact of training on the rater reliability was also assessed. Methods: The subjects were eleven healthy women, who were evaluated by two physical therapists and one physical therapist recorded the results of the study. Standard goniometer was used as the measurement tool. The ankle and subtalar joint neutral position and the active range of motion of the ankle and subtalar joint were measured. Intra-rater reliability and inter-rater reliability measures were analyzed with intraclass correlation coefficients. Results: Intra-rater reliability and inter-rater reliability ranged from high to medium for the neutral position of the ankle joint complex. Intra-rater reliability for dorsiflexion and plantarflexion measurements was medium, while the inter-rater reliability was high. The range of motion of the subtalar joint was measured, and the intra-rater reliability and inter-rater reliability were low and medium, respectively Also, the intra-rater reliability was increased with formal training of the measurement techniques. Intra-rater reliability was reduced in case the raters had not undertaken the training. Conclusion: In summary, the results obtained with the measurement tools and joint measurement of position, indicate the consistency of repeated measurements made by the same observers. Under the same circumstances along with repetition of the same measurement technique during training caused an increase in the rater reliability of formally trained raters.

The Reliability of the Pediatric Functional Muscle Testing in Children with Developmental Delay

  • Seo, Hye-Jung;Kim, Joong-Hwi
    • The Journal of Korean Physical Therapy
    • /
    • v.27 no.4
    • /
    • pp.183-189
    • /
    • 2015
  • Purpose: The aim of this study was to examine the test-retest and inter-rater reliability of the pediatric functional muscle testing (PFMT) when applied to children with developmental delay. Methods: Sixteen children with developmental delay (seven females, nine males) participated in this study. For the inter-rater reliability, each was scored on the PMFT by two pediatric physical therapists with more than 8 years of clinical experience on the same day. For assessment of the test-retest reliability, one therapist tested the children again within 10 days. The second measurement was performed by taking a first measurement in video. Intraclass correlation coefficient (ICC) was calculated to determine the test-retest and inter-rater reliability of the PFMT, and Chronbach's alpha was used to measure internal consistency. Results: The results of this study were as follows: 1) The test-retest ICC of the score of the infant action month and the right side of the PFMT was from 0.53 to 1.00 and from 0.63 to 0.99, respectively. 2) The inter-rater ICC of the score of the infant action month and the right side of the PFMT was from 0.66 to 1.00 and from 0.64 to 1.00, respectively. 3) Chronbach's alpha was 0.93. The internal consistency indicated excellent. Conclusion: In conclusion, this study showed that the test-retest and inter-rater reliability of the PFMT was relatively high, except for a few items. Therefore, it can be suggested that the PFMT will be a useful tool for measurement of muscle strength for children with developmental delay if it be some modifications.

Construction of the Mobility to Participation Assessment Scale for Stroke (MPASS) and Testing Its Validity and Reliability in Persons With Stroke in Thailand

  • Nawarat, Jiraphat;Chaipinyo, Kanda
    • Journal of Preventive Medicine and Public Health
    • /
    • v.55 no.4
    • /
    • pp.334-341
    • /
    • 2022
  • Objectives: This study was conducted to develop the Mobility to Participation Assessment Scale for Stroke (MPASS) and assess its content validity, internal consistency, inter-rater and intra-rater reliability, and convergent validity in people with stroke living in the community. Methods: The MPASS was developed using published data on mobility-related activity and participation timing in elderly individuals, and then reviewed by community physical therapists. Content validity was established by reaching a consensus of experienced physical therapists in a focus group. The MPASS was scored for 32 participants with stroke (mean age 61.75±4.92 years) by 3 individual testers. Reliability was examined using the intraclass correlation coefficient (ICC), internal consistency using the Cronbach alpha coefficient (α), and convergent validity using the Pearson correlation coefficient (r) to compare the MPASS to the Modified Rivermead Mobility Index as a referent test of mobility. Results: The MPASS consists of 8 items, and its scoring system provides information on the ability of people with stroke to reach a movement level enabling them to live in society, including interactions with other people and safe living in the community. The interrater and intra-rater reliability were excellent (ICC, 0.948; 95% confidence interval [CI], 0.893 to 0.982 and ICC, 0.967; 95% CI, 0.933 to 0.989, respectively). Internal consistency was good (α=0.877). The convergent validity was moderate (r=0.646; p<0.001). Conclusions: The newly developed MPASS showed acceptable construct validity and high reliability. The MPASS is suitable for use in people with stroke, especially those who have been discharged and live in the community with the ability to initiate sitting.

Translation and Validation of the Korean Version Revised Nottingham Sensory Assessment (한국판 수정된 노팅엄 감각평가의 신뢰도 타당도 연구)

  • Ji, Eun-Kyu;Lee, Sang-Heon
    • The Journal of the Korea Contents Association
    • /
    • v.20 no.9
    • /
    • pp.511-519
    • /
    • 2020
  • The aim of this study was to translate and validate the revised Nottingham Sensory Assessment(rNSA) in Republic of Korea. A cross-sectional study was conducted to translate the rNSA into Korean using a modified forward/backward translation procedure. Inter-rater and intra-rater reliability, internal consistency, and concurrent validity were investigated to validate the Korean version rNSA. The Korean version rNSA showed excellent inter-rater reliability (r=0.92-1.00) and intra-rater reliability (r=0.93-1.00). Significant correlations were found between sensory assessment results of the Korean version of the rNSA and the Korean Fugl Meyer Assessment Sensory subscales (r=0.96). The Cronbach α value of internal consistency of Korean version rNSA was ranged from 0.73 to 0.90, the value of K-FMA-S was ranged from 0.70 to 0.88. In these results, psychometric properties of the Korean version of the rNSA achieved the standard level and can be feasible in clinical practice to assess sensory function following stroke in Republic of Korea.

Inter-Rater Reliability of Stroke Rehabilitation Assessment of Movement for Patients With Stroke (뇌졸중 환자 평가를 위한 Stroke Rehabilitation Assessment of Movement의 측정자간 신뢰도)

  • Yun, Sung-Joon;Weon, Jong-Hyuck;Lee, Chung-Hwi
    • Physical Therapy Korea
    • /
    • v.17 no.3
    • /
    • pp.48-58
    • /
    • 2010
  • The aim of this study was performed to determine the inter-rater reliability of the Stroke Rehabilitation Assessment of Movement (STREAM) translated in Korean. This was a new clinical measurement tool for evaluating the recovery of voluntary movement and basic mobility following stroke. A direct-observation reliability study was conducted on 20 patients who had strokes and were in a rehabilitation setting. Subjects were assessed by two physical therapists. The reliability of the STREAM scores was demonstrated by weighted kappa statistics for inter-rater agreement on scores for individual items ranged from .83 to 1.0, intraclass correlation coefficients for total score was .99, and for subscale scores was ranged from .96 to .99. The internal consistency of the STREAM scores was demonstrated by Cronbach alphas of greater than .99 on the subscales and overall. These high levels of reliability support the use of the STREAM translated in Korean instrument for the measurement of motor recovery following stroke.

Development of the Korean Handwriting Assessment for Children Using Digital Image Processing

  • Lee, Cho Hee;Kim, Eun Bin;Lee, Onseok;Kim, Eun Young
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.13 no.8
    • /
    • pp.4241-4254
    • /
    • 2019
  • The efficiency and accuracy of handwriting measurement could be improved by adopting digital image processing. This study developed a computer-based Korean Handwriting Assessment tool. Second graders participated in this study by performing writing tasks of consonants, vowels, words, and sentences. We extracted boundary parameters for each letter using digital image processing and calculated the variables of size, size coefficient of variation (CV), misalignment, inter-letter space, inter-word space, and ratio of inter-letter space to inter-word space. Children were also administered traditional handwriting and visuomotor tests. Digital variables from image processing were correlated with these previous tests. Using these correlations, we established a three-point scoring system that computed test scores for each variable. We analyzed inter-rater reliability between the computer rater and human rater and test-retest reliability between the first and second performances. The validity was examined by analyzing the relationship between the Korean Handwriting Assessment and previous handwriting and visuomotor tests. We suggested the Korean Handwriting Assessment to measure size, size consistency, misalignment, inter-letter space, inter-word space, and space ratio using digital image processing. This Korean Handwriting Assessment tool proved to have reliability and validity. It is expected to be useful for assessing children's handwriting.

Development and Application of an Online Scoring System for Constructed Response Items (서답형 문항 온라인 채점 시스템의 개발과 적용)

  • Cho, Jimin;Kim, Kyunghoon
    • The Journal of Korean Association of Computer Education
    • /
    • v.17 no.2
    • /
    • pp.39-51
    • /
    • 2014
  • In high-stakes tests for large groups, the efficiency with which students' responses are distributed to raters and how systematic scoring procedures are managed is important to the overall success of the testing program. In the scoring of constructed response items, it is important to understand whether the raters themselves are making consistent judgments on the responses, and whether these judgments are similar across all raters in order to establish measures of rater reliability. The purpose of this study was to design, develop and carry out a pilot test of an online scoring system for constructed response items administered in a paper-and-pencil test to large groups, and to verify the system's reliability. In this study, we show that this online system provided information on the scoring process of individual raters, including intra-rater and inter-rater consistency, compared to conventional scoring methods. We found this system to be especially effective for obtaining reliable and valid scores for constructed response items.

  • PDF

An Analysis on Reliabilities of Scoring Methods and Rubric Ratings Number for Performance Assessments of Middle School Students' Science Investigation Activities (중학생 과학탐구활동 수행평가 시 채점 방식 및 척도의 수에 따른 신뢰도 분석)

  • Kim, Hyung-Jun;Yoo, June-Hee
    • Journal of The Korean Association For Science Education
    • /
    • v.30 no.2
    • /
    • pp.275-290
    • /
    • 2010
  • In this study, reliabilities of holistic scoring method and analytic scoring method were analyzed in performance assessments of middle school students' science investigation activity. Reliabilities of 2, 3, and 4~7-level rubric ratings for analytic scoring methods were compared to figure out optimized numbers of rubric ratings. Two trained raters rated four activity sheets of 60 students by two rating methods and three kinds of rubric ratings. Internal consistency reliabilities of holistic scoring methods were higher than those of analytic scoring methods, while intrarater reliabilities of analytic scoring were higher than those of holistic scoring methods. Internal consistency reliabilities and intra-rater reliabilities of 3-level rubric rating showed similar patterns of 4~7-level rubric ratings. But students' discriminations, item difficulties and item-response curves showed that the 3-level rubric ratings was reliable. These results suggest that holistic scoring method could be adapted to increase internal consistency reliabilities with improvement in intra-rater reliabilities by rater's conferences. Also, the 3-level rubric rating would be enough for good reliability in case of adapting analytic scoring methods.