• Title/Summary/Keyword: 채점자 교육

Search Result 67, Processing Time 0.022 seconds

An Analysis on Reliabilities of Scoring Methods and Rubric Ratings Number for Performance Assessments of Middle School Students' Science Investigation Activities (중학생 과학탐구활동 수행평가 시 채점 방식 및 척도의 수에 따른 신뢰도 분석)

  • Kim, Hyung-Jun;Yoo, June-Hee
    • Journal of The Korean Association For Science Education
    • /
    • v.30 no.2
    • /
    • pp.275-290
    • /
    • 2010
  • In this study, reliabilities of holistic scoring method and analytic scoring method were analyzed in performance assessments of middle school students' science investigation activity. Reliabilities of 2, 3, and 4~7-level rubric ratings for analytic scoring methods were compared to figure out optimized numbers of rubric ratings. Two trained raters rated four activity sheets of 60 students by two rating methods and three kinds of rubric ratings. Internal consistency reliabilities of holistic scoring methods were higher than those of analytic scoring methods, while intrarater reliabilities of analytic scoring were higher than those of holistic scoring methods. Internal consistency reliabilities and intra-rater reliabilities of 3-level rubric rating showed similar patterns of 4~7-level rubric ratings. But students' discriminations, item difficulties and item-response curves showed that the 3-level rubric ratings was reliable. These results suggest that holistic scoring method could be adapted to increase internal consistency reliabilities with improvement in intra-rater reliabilities by rater's conferences. Also, the 3-level rubric rating would be enough for good reliability in case of adapting analytic scoring methods.

대학별고사를 위한 문항분석, 표준점수, 검사동등화

  • 성태제
    • Communications for Statistical Applications and Methods
    • /
    • v.1 no.1
    • /
    • pp.206-214
    • /
    • 1994
  • 본 논문은 1994학년도 부터 부활된 대학별고사 실시에 따른 문항분석, 표준 점수제 그리고 검사동등화의 문제점을 지적하기 위하여 교육측정이론의 기본 개념을 소개하는데 있다. 대학별고사의 타당성과 신뢰성을 보장받기 위하여는 양질의 문항제작이 우선하여야하며, 이를 위하여 문항분석은 종전에 사용하던 고전검사이론 보다는 문항반응이론을 이용하는 것이 바람직하다. 문항반응이론에 의한 문항분석은 피험자 집단의 특성에 의하여 문항특성이 달리 분석되지 않는 특징을 지니고 있기 때문이다. 문항이 논술형일 경우 채점자간 신뢰도와 채점자 내 신뢰도를 간과하여서는 안될 것이다. 다양한 선택과목을 채택하는 대학별 고사에서 입학 사정을 위하여 원점수를 사용하거나, 표준점수 혹은 검사동등화 방법을 이용하고 있으나 이는 교육측정이론에 위배된다. 다른 과목에 대한 인가의 능력을 상대비교 할 수 없으며, 표준점수와 검사동등화는 동일 능력에 대한 상대비교를 위한 방법이다. 특히 검사동등화는 동일 특성, 공정성, 모교집단 불변성, 대칭성을 전제한다. 표준점수제에 의하여 수험생들의 다른 능력을 상대 비교하는 방법은 다른 능력이 점수로 표현되기 때문에 가능하나 그 점수가 무엇을 의미하는 가를 분석할 때는 교육평가의 기본 철학에도 위배된다.

  • PDF

Analysis on the Characteristics and Criteria Development in Performing Science Inquiry Tasks for Elementary School Students (초등학생 과학 탐구과제 수행 특성 분석 및 채점기준 개발)

  • Ham, Eun Hye;Lee, You-kyung;Park, So-Young;Park, Hyejin;Lee, Sunghye
    • Journal of The Korean Association For Science Education
    • /
    • v.42 no.2
    • /
    • pp.239-252
    • /
    • 2022
  • This study aims to develop performance criteria based on characteristics observed in science inquiry tasks for elementary school students. First, the performance characteristics by observing 70 fifth-grade elementary school students' science inquiry activity report are listed. Second, the checklist-type scoring criteria in connection with the theoretical framework of scientific inquiry process and relevant competencies are developed. Third, with the developed scoring criteria, 11 raters participate in scoring 350 students' reports. The main findings are as follow: first, the scoring data are well-fitted for the many-faceted Rasch model, and 22 scoring criteria are reasonably-well differentiated for various levels of proficiency. Second, at low performance level, observable characteristics are to answer questions explicitly required by the task or to observe objects or phenomena using pre-learned scientific concepts, while at high performance level, to explore additional data other than given data or to reflect on one's experimental process. Based on the results, the usefulness of analyzing students' performance characteristics for developing the scoring criteria, and further research directions are discussed.

Analysis of Assessment Types, Scoring Methods and Reliability of Science Performance Assessment in Middle and High School (중등학교 과학 수행평가의 평가 유형과 채점 방식 및 신뢰도 분석)

  • Lee, Ki-Young;An, Hui-Soo
    • Journal of The Korean Association For Science Education
    • /
    • v.25 no.2
    • /
    • pp.173-183
    • /
    • 2005
  • In this study, we questioned what assessment types and scoring methods of science performance assessment(SPA) were being used in middle and high school, and how much these SPA scores were reliable(generalizable). To answer these questions, SPA data obtained from the seven schools were classified according to assessment type and scoring method. Based upon this classification, we analyzed the reliability by applying generalizability theory. The result, from the classification of assessment type and scoring method, showed that SPA types of the seven schools were divided into two types: paper-pencil type and task type. Paper-pencil type included answer(content)-restricted essay-type test solely. Task type has two parts: process and outcome assessment. As the results of analyzing scoring methods of the seven schools, there were two cases in the way of scoring methods: one case is scoring all essay-type items and performance tasks by one teacher, the other is scoring assigned performance tasks by two teachers. But the case of scoring assigned essay-type items or the case of cross scoring by two or more teachers were not found. The findings of the reliability analysis are as follows: (1) Effect of essay-type item to SPA score was larger than that of performance task. (2) There was remarkable difference among the seven schools' interaction effect of person and rater in scoring performance tasks. (3) Most of generalizability(reliability) coefficients of SPA for the seven schools were smaller than the acceptable generalizability coefficient(0.80). Therefore, the population of statistical parameters such as number of item, task and rater, should be increased for approaching the acceptable generalizability level.

An Application of Generalizability Theory to Self-introduction Letter and Teacher's Recommendation Letter Used in Identification of Mathematical Gifted Students by Observations and Nominations (관찰.추천에 의한 수학영재 선발 시 사용되는 자기소개서와 교사추천서 평가에 대한 일반화가능도 이론의 활용)

  • Kim, Sung-Chan;Kim, Sung-Yeun;Han, Ki-Soon
    • Communications of Mathematical Education
    • /
    • v.26 no.3
    • /
    • pp.251-271
    • /
    • 2012
  • The purpose of this study is: 1) to determine error sources and the effects of each error source, 2) to investigate optimal measuring conditions from holistic and analytic scoring methods, and 3) to compare the value of reliability between Cronbach's alpha and the generalizability coefficient in self-introduction letter and teacher's recommendation letter based on the generalizability theory in identification of mathematical gifted students by observations and nominations. Data of this study were collected from the science education institute for the gifted attached to the university located within in a capital city for the 2011 academic year. Scores form two raters using holistic and analytic scoring methods in both assessment types were used. The results of this study were as follows. First, as to both assessment types, error sources for people were relatively large regardless of scoring methods. However, error sources for raters in holistic scoring methods had a more significant impact than those of analytic scoring methods. Second, to set optimal measuring conditions in the self-introduction letter and teacher's recommendation letter, if we fixed the number of raters into 2 based on holistic scoring methods, at least 5 and 10 content domains were needed, respectively. In addition, the number of items in teacher's recommendation letter should be more than 3 when we fixed the number of content domains into 4, and the number of items in self-introduction letter should be more than 8 when we fixed the number of content domains into 6 using analytic scoring methods. Third, Cronbach's alpha having only a single source of errors was higher than the generalizability coefficient regardless of assessment types and scoring methods. Hence we recommend that generalizability coefficient based on various error sources such as raters, content domains, and items should be considered to keep a satisfactory level of reliability in both assessment types.

Reliability of Standardized Patients as Raters in Objective Structured Clinical Examination (객관 구조화 절차 기술 평가에서 채점자로서의 표준화환자의 신뢰도)

  • Son, Hee-Jeong;Moon, Joong-Bum;Lee, Hyang-Ah;Roh, Hye-Rin
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.12 no.1
    • /
    • pp.318-326
    • /
    • 2011
  • The purpose of this study is to investigate whether standardized patient(SP) can be used as a reliable examiner in Objective Structured Clinical Examination(OSCE). 4 SPs and 4 faculties who have more than 2 years experience of OSCE scoring were selected. For 1 assignment 2 members of faculty and 2 SPs were designated as raters. SPs were educated for assessing 2 technical skills, male Foley catheter insertion and wound dressing, for 8 hours (4 hours / day, each topic). The definition, method, cautions and complications for each of procedural skills were covered in the education. Theoretical lectures, video learning, faculty demonstration and practical training on mannequins were employed. The 8 raters were standardized for an hour with simulated OSCE scoring using previous videos on the day before the OSCE. Each assessment was composed of 14 checklists and 1 global rate. The allotted time for each assignment was 5minutes and for evaluation time 2 minutes per student. The evaluation from the faculty and SPs were compared and analyzed with the GENOVA program. The overall generalizability coefficient (G coefficient) was 0.839 from two cases of OASTS. The reliability of the raters was high, 0.946. The inter-rater agreement between faculty group and SP group was 0.949 for checklist and 0.908 for global rating. Therefore SPs can play a role of raters in OSCE for procedural skills, if they are given the appropriate training.

Research on the Syntactic-Semantic Analysis System on Compound Sentence for Descriptive-type Grading (서술형 문항 채점을 위한 복합문 구문의미분석 시스템에 대한 연구)

  • Kang, WonSeog
    • The Journal of Korean Association of Computer Education
    • /
    • v.21 no.6
    • /
    • pp.105-115
    • /
    • 2018
  • The descriptive-type question is appropriate for deep thinking ability evaluation, but it is not easy to grade. Since, even though same grading criterion, the graders produce different scores, we need the objective evaluation system. However, the system needs the Korean analysis. As the descriptive-type answering is described with the compound sentence, the system has to analyze the compound sentence. This paper develops the Korean syntactic-semantic analysis system for compound sentence and evaluates performance of the system. This system selects the modifiee of the word phrase using syntactic-semantic constraint and semantic dictionary. The 93% accurate rate shows that the system is effective. This system will be utilized in descriptive-type grading and Korean processing.

The Automated Scoring of Kinematics Graph Answers through the Design and Application of a Convolutional Neural Network-Based Scoring Model (합성곱 신경망 기반 채점 모델 설계 및 적용을 통한 운동학 그래프 답안 자동 채점)

  • Jae-Sang Han;Hyun-Joo Kim
    • Journal of The Korean Association For Science Education
    • /
    • v.43 no.3
    • /
    • pp.237-251
    • /
    • 2023
  • This study explores the possibility of automated scoring for scientific graph answers by designing an automated scoring model using convolutional neural networks and applying it to students' kinematics graph answers. The researchers prepared 2,200 answers, which were divided into 2,000 training data and 200 validation data. Additionally, 202 student answers were divided into 100 training data and 102 test data. First, in the process of designing an automated scoring model and validating its performance, the automated scoring model was optimized for graph image classification using the answer dataset prepared by the researchers. Next, the automated scoring model was trained using various types of training datasets, and it was used to score the student test dataset. The performance of the automated scoring model has been improved as the amount of training data increased in amount and diversity. Finally, compared to human scoring, the accuracy was 97.06%, the kappa coefficient was 0.957, and the weighted kappa coefficient was 0.968. On the other hand, in the case of answer types that were not included in the training data, the s coring was almos t identical among human s corers however, the automated scoring model performed inaccurately.

Automated Scoring System for Korean Short-Answer Questions Using Predictability and Unanimity (기계학습 분류기의 예측확률과 만장일치를 이용한 한국어 서답형 문항 자동채점 시스템)

  • Cheon, Min-Ah;Kim, Chang-Hyun;Kim, Jae-Hoon;Noh, Eun-Hee;Sung, Kyung-Hee;Song, Mi-Young
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.5 no.11
    • /
    • pp.527-534
    • /
    • 2016
  • The emergent information society requires the talent for creative thinking based on problem-solving skills and comprehensive thinking rather than simple memorization. Therefore, the Korean curriculum has also changed into the direction of the creative thinking through increasing short-answer questions that can determine the overall thinking of the students. However, their scoring results are a little bit inconsistency because scoring short-answer questions depends on the subjective scoring of human raters. In order to alleviate this point, an automated scoring system using a machine learning has been used as a scoring tool in overseas. Linguistically, Korean and English is totally different in the structure of the sentences. Thus, the automated scoring system used in English cannot be applied to Korean. In this paper, we introduce an automated scoring system for Korean short-answer questions using predictability and unanimity. We also verify the practicality of the automatic scoring system through the correlation coefficient between the results of the automated scoring system and those of human raters. In the experiment of this paper, the proposed system is evaluated for constructed-response items of Korean language, social studies, and science in the National Assessment of Educational Achievement. The analysis was used Pearson correlation coefficients and Kappa coefficient. Results of the experiment had showed a strong positive correlation with all the correlation coefficients at 0.7 or higher. Thus, the scoring results of the proposed scoring system are similar to those of human raters. Therefore, the automated scoring system should be found to be useful as a scoring tool.

Performance Comparison of Automated Scoring System for Korean Short-Answer Questions (한국어 서답형 문항 자동채점 시스템의 성능 개선)

  • Cheon, Min-Ah;Kim, Chang-Hyun;Kim, Jae-Hoon;Noh, Eun-Hee;Sung, Kyung-Hee;Song, Mi-Young;Park, Jong-Im;Kim, Yuhyang
    • 한국어정보학회:학술대회논문집
    • /
    • 2016.10a
    • /
    • pp.181-185
    • /
    • 2016
  • 최근 교육과정에서 학생들의 능력 평가는 단순 암기보다 학생들의 종합적인 사고력을 판단할 수 있는 서답형 문항을 늘리는 방향으로 변하고 있다. 그러나 서답형 문항의 경우 채점하는 데 시간과 비용이 많이 들고, 채점자의 주관에 따라 채점 결과의 일관성과 신뢰성을 보장하기 어렵다는 문제가 있다. 이런 점을 해결하기 위해 해외의 사례를 참고하여 국내에서도 서답형 문항에 자동채점 시스템을 적용하는 연구를 진행하고 있다. 본 논문에서는 2014년도에 개발된 '한국어 문장 수준 서답형 문항 자동채점 시스템'의 성능분석을 바탕으로 언어 처리 기능과 자동채점 성능을 개선한 2015년도 자동채점 시스템을 간략하게 소개하고, 각 자동채점 시스템의 성능을 비교 분석한다. 성능 분석 대상으로는 2014년도 국가수준 학업성취도평가의 서답형 문항을 사용했다. 실험 결과, 개선한 시스템의 평균 완전 일치도와 평균 정확률이 기존의 시스템보다 각각 9.4%p, 8.9%p 증가했다. 자동채점 시스템의 목적은 가능한 채점 시간을 단축하면서 채점 기준의 일관성과 신뢰성을 확보하는 데 있으므로, 보완한 2015년 자동채점 시스템의 성능이 향상되었다고 판단할 수 있다.

  • PDF