• Title/Summary/Keyword: scoring

Search Result 1,630, Processing Time 0.029 seconds

Analysis of Assessment Types, Scoring Methods and Reliability of Science Performance Assessment in Middle and High School (중등학교 과학 수행평가의 평가 유형과 채점 방식 및 신뢰도 분석)

  • Lee, Ki-Young;An, Hui-Soo
    • Journal of The Korean Association For Science Education
    • /
    • v.25 no.2
    • /
    • pp.173-183
    • /
    • 2005
  • In this study, we questioned what assessment types and scoring methods of science performance assessment(SPA) were being used in middle and high school, and how much these SPA scores were reliable(generalizable). To answer these questions, SPA data obtained from the seven schools were classified according to assessment type and scoring method. Based upon this classification, we analyzed the reliability by applying generalizability theory. The result, from the classification of assessment type and scoring method, showed that SPA types of the seven schools were divided into two types: paper-pencil type and task type. Paper-pencil type included answer(content)-restricted essay-type test solely. Task type has two parts: process and outcome assessment. As the results of analyzing scoring methods of the seven schools, there were two cases in the way of scoring methods: one case is scoring all essay-type items and performance tasks by one teacher, the other is scoring assigned performance tasks by two teachers. But the case of scoring assigned essay-type items or the case of cross scoring by two or more teachers were not found. The findings of the reliability analysis are as follows: (1) Effect of essay-type item to SPA score was larger than that of performance task. (2) There was remarkable difference among the seven schools' interaction effect of person and rater in scoring performance tasks. (3) Most of generalizability(reliability) coefficients of SPA for the seven schools were smaller than the acceptable generalizability coefficient(0.80). Therefore, the population of statistical parameters such as number of item, task and rater, should be increased for approaching the acceptable generalizability level.

Scoring Korean Written Responses Using English-Based Automated Computer Scoring Models and Machine Translation: A Case of Natural Selection Concept Test (영어기반 컴퓨터자동채점모델과 기계번역을 활용한 서술형 한국어 응답 채점 -자연선택개념평가 사례-)

  • Ha, Minsu
    • Journal of The Korean Association For Science Education
    • /
    • v.36 no.3
    • /
    • pp.389-397
    • /
    • 2016
  • This study aims to test the efficacy of English-based automated computer scoring models and machine translation to score Korean college students' written responses on natural selection concept items. To this end, I collected 128 pre-service biology teachers' written responses on four-item instrument (total 512 written responses). The machine translation software (i.e., Google Translate) translated both original responses and spell-corrected responses. The presence/absence of five scientific ideas and three $na{\ddot{i}}ve$ ideas in both translated responses were judged by the automated computer scoring models (i.e., EvoGrader). The computer-scored results (4096 predictions) were compared with expert-scored results. The results illustrated that no significant differences in both average scores and statistical results using average scores was found between the computer-scored result and experts-scored result. The Pearson correlation coefficients of composite scores for each student between computer scoring and experts scoring were 0.848 for scientific ideas and 0.776 for $na{\ddot{i}}ve$ ideas. The inter-rater reliability indices (Cohen kappa) between computer scoring and experts scoring for linguistically simple concepts (e.g., variation, competition, and limited resources) were over 0.8. These findings reveal that the English-based automated computer scoring models and machine translation can be a promising method in scoring Korean college students' written responses on natural selection concept items.

Scoring Methods of Polysomnography for Diagnosis of Sleep Apnea in Adolescents (청소년에서 수면 무호흡 진단을 위한 수면 다원 검사의 판독 방법)

  • Lee, Keu Sung;Sheen, Seung Soo;Lee, Il Jae;Choi, Byung-Joo;Choi, Ji Ho;Park, Do-Yang;Kim, Han Tai;Kim, Hyun Jun
    • Korean Journal of Otorhinolaryngology-Head and Neck Surgery
    • /
    • v.61 no.11
    • /
    • pp.593-599
    • /
    • 2018
  • Background and Objectives Respiratory scoring guidelines for children and adults have been used for evaluating adolescents both in the 2007 and 2012 American Academy of Sleep Medicine (AASM) scoring manuals. We compared the scoring methods of polysomnography used in these scoring manuals, where pediatric and adult scoring rules were adopted for the diagnosis of sleep apnea in adolescents. Subjects and Method 106 Korean subjects aged between 13 and 18 years were enrolled. All subjects underwent overnight polysomnography in a sleep laboratory. Data were scored according to both pediatric and adult guidelines in the 2007 and 2012 AASM scoring manuals. Results Both pediatric and adult apnea hypopnea index (AHI) using the 2012 method were significantly higher than those using the 2007 method. The difference in AHI compared between pediatric and adult scores with the 2012 AASM scoring system was markedly decreased from that with the 2007 method. There was a significant discordance in sleep apnea diagnosis between pediatric and adult scoring rules in the 2012 method. Conclusion Both pediatric and adult rules were used for the diagnosis of adolescent sleep apnea in the 2012 method. However, there was significant discordance in the diagnosis between pediatric and adult scoring guidelines in the 2012 AASM manual, probably due to different cut-off values of AHI for the diagnosis of sleep apnea in pediatric (${\geq}1$) and adult (${\geq}5$) patients. Further studies are needed to determine a more reasonable cut-off value for the diagnosis of sleep apnea in adolescents.

The Automated Scoring of Kinematics Graph Answers through the Design and Application of a Convolutional Neural Network-Based Scoring Model (합성곱 신경망 기반 채점 모델 설계 및 적용을 통한 운동학 그래프 답안 자동 채점)

  • Jae-Sang Han;Hyun-Joo Kim
    • Journal of The Korean Association For Science Education
    • /
    • v.43 no.3
    • /
    • pp.237-251
    • /
    • 2023
  • This study explores the possibility of automated scoring for scientific graph answers by designing an automated scoring model using convolutional neural networks and applying it to students' kinematics graph answers. The researchers prepared 2,200 answers, which were divided into 2,000 training data and 200 validation data. Additionally, 202 student answers were divided into 100 training data and 102 test data. First, in the process of designing an automated scoring model and validating its performance, the automated scoring model was optimized for graph image classification using the answer dataset prepared by the researchers. Next, the automated scoring model was trained using various types of training datasets, and it was used to score the student test dataset. The performance of the automated scoring model has been improved as the amount of training data increased in amount and diversity. Finally, compared to human scoring, the accuracy was 97.06%, the kappa coefficient was 0.957, and the weighted kappa coefficient was 0.968. On the other hand, in the case of answer types that were not included in the training data, the s coring was almos t identical among human s corers however, the automated scoring model performed inaccurately.

Exploring automatic scoring of mathematical descriptive assessment using prompt engineering with the GPT-4 model: Focused on permutations and combinations (프롬프트 엔지니어링을 통한 GPT-4 모델의 수학 서술형 평가 자동 채점 탐색: 순열과 조합을 중심으로)

  • Byoungchul Shin;Junsu Lee;Yunjoo Yoo
    • The Mathematical Education
    • /
    • v.63 no.2
    • /
    • pp.187-207
    • /
    • 2024
  • In this study, we explored the feasibility of automatically scoring descriptive assessment items using GPT-4 based ChatGPT by comparing and analyzing the scoring results between teachers and GPT-4 based ChatGPT. For this purpose, three descriptive items from the permutation and combination unit for first-year high school students were selected from the KICE (Korea Institute for Curriculum and Evaluation) website. Items 1 and 2 had only one problem-solving strategy, while Item 3 had more than two strategies. Two teachers, each with over eight years of educational experience, graded answers from 204 students and compared these with the results from GPT-4 based ChatGPT. Various techniques such as Few-Shot-CoT, SC, structured, and Iteratively prompts were utilized to construct prompts for scoring, which were then inputted into GPT-4 based ChatGPT for scoring. The scoring results for Items 1 and 2 showed a strong correlation between the teachers' and GPT-4's scoring. For Item 3, which involved multiple problem-solving strategies, the student answers were first classified according to their strategies using prompts inputted into GPT-4 based ChatGPT. Following this classification, scoring prompts tailored to each type were applied and inputted into GPT-4 based ChatGPT for scoring, and these results also showed a strong correlation with the teachers' scoring. Through this, the potential for GPT-4 models utilizing prompt engineering to assist in teachers' scoring was confirmed, and the limitations of this study and directions for future research were presented.

Developing an Automated English Sentence Scoring System for Middle-school Level Writing Test by Using Machine Learning Techniques (기계학습을 이용한 중등 수준의 단문형 영어 작문 자동 채점 시스템 구현)

  • Lee, Gyoung Ho;Lee, Kong Joo
    • Journal of KIISE
    • /
    • v.41 no.11
    • /
    • pp.911-920
    • /
    • 2014
  • In this paper, we introduce an automatic scoring system for middle-school level writing test based on using machine learning techniques. We discuss overall process and features for building an automatic English writing scoring system. A "concept answer" which represents an abstract meaning of text is newly introduced in order to evaluate the elaboration of a student's answer. In this work, multiple machine learning algorithms are adopted for scoring English writings. We suggest a decision process "optimal combination" which optimally combines multiple outputs of machine learning algorithms and generates a final single output in order to improve the performance of the automatic scoring. By experiments with actual test data, we evaluate the performance of overall automated English writing scoring system.

Computerized Image Analysis of Micronucleated Reticulocytes in Mouse Bone Marrow (컴퓨터 이미지 분석법을 이용한 마우스 골수세포에서 소핵의 계수)

  • 권정;홍미영;고우석;정문구;이미가엘
    • Toxicological Research
    • /
    • v.18 no.4
    • /
    • pp.369-374
    • /
    • 2002
  • The present study was performed to validate an automated image analysis system (Loats Automated Micronucleus Scoring System) for the mouse bone marrow micronucleus assay, comparing with conventional microscopic scoring. Two studies were conducted to provide slides for a comparison of micro-nucleated polychromatic erythrocytes (MNPCEs) values collected manually to those collected by the auto-mated system. Test article A was used as an example of a compound negative for the induction of micronuclei and test article B was wed as a micronucleus-inducing agent to elicit a positive response. Cyclophosphamide was included to provide an positive control in two studies. Bone marrow samples were collected 24 h after administration of test article A and B in male ICR mice. The cells were fixed with absolute methanol and stained with May-Grunwald and Giemsa. The number of MNPCEs was determined by the analysis of 1000 total PCEs per bone marrow sample. In addition to micronucleus scoring, an index of bone marrow toxicity based on PCE ratio (% of PCEs to total erythrocytes) was determined for each sample. The automated and manual scoring was similar when the MNPCEs incidence induced by each test article was less than 10. However manual scoring was able to effectively enumerate micronucleated PCEs in mouse bone marrow when MNPCEs incidence was more than 10, such as cyclophosphamide treatment. Conversely, PCE ratio was superior in computer-assisted image analysis. Taken together, it is suggested that improvement of the automated image analysis may be necessary to render the automatic scoring as sensitive as manual scoring for routine counting of micronuclei, especially because it is superior in objectivity and high throughput scoring.

Proposal on the Severity Scoring System of Rhinitis ; Comparison, Analysis and Establishment (비염의 평가 방법에 대한 제언 ; 비교 분석 및 설립)

  • Hwang Sun-Yi;Hwang Min-Bo;Lim Jin-Ho;Jee Seon-Young;Kim Sang-Chan;Baek Jung-Han;Lee Sang-Gon
    • Journal of Physiology & Pathology in Korean Medicine
    • /
    • v.20 no.1
    • /
    • pp.235-244
    • /
    • 2006
  • There is much confusion in the field of Rhinitis regarding how to best measure disease severity objectively, Therefore, we aimed to establish a new adequate scoring system for Rhinitis, that should be based on comparison analysis of various scoring systems. We report as follows. We researched for data relating to severity scoring systems for rhinitis in Entrez PubMed from 1995 to 2005 and in Kiss Kstudy. Results and Conclusions: Properties of severity scoring systems were validity, sensitivity of change and ease of use. The essential items of severity scoring systems were subjective symptoms. The criterion of severity were divided into subjective symptoms and complication and Quality of Life. Intensity items are nasal obstruction, rhinorrhea, sneezing, itching, Postnasal drip, nasal mucosa swelling, nasal mucosa color, complication. Subjective symptoms is difficulty of Life. The significant items of severity scoring system are nasal symptoms. The whole score does with the maximum 30 scores. As it were, we assumed nasal symptoms accounted for around 80% of each total score, with complication and difficulty of Life representing 20%.

Design and Implementation of an Automatic Scoring Model Using a Voting Method for Descriptive Answers (투표 기반 서술형 주관식 답안 자동 채점 모델의 설계 및 구현)

  • Heo, Jeongman;Park, So-Young
    • Journal of the Korea Society of Computer and Information
    • /
    • v.18 no.8
    • /
    • pp.17-25
    • /
    • 2013
  • TIn this paper, we propose a model automatically scoring a student's answer for a descriptive problem by using a voting method. Considering the model construction cost, the proposed model does not separately construct the automatic scoring model per problem type. In order to utilize features useful for automatically scoring the descriptive answers, the proposed model extracts feature values from the results, generated by comparing the student's answer with the answer sheet. For the purpose of improving the precision of the scoring result, the proposed model collects the scoring results classified by a few machine learning based classifiers, and unanimously selects the scoring result as the final result. Experimental results show that the single machine learning based classifier C4.5 takes 83.00% on precision while the proposed model improve the precision up to 90.57% by using three machine learning based classifiers C4.5, ME, and SVM.

Research on the E-Commerce Credit Scoring Model Using the Gaussian Density Function

  • Xiao, Qiang;He, Rui-chun;Zhang, Wei
    • Journal of Information Processing Systems
    • /
    • v.11 no.2
    • /
    • pp.173-183
    • /
    • 2015
  • At present, it is simple to the electronic commerce credit scoring model, as a brush credit phenomenon in E-commerce has emerged. This phenomenon affects the judgment of consumers and hinders the rapid development of E-commerce. In this paper, that E-commerce credit evaluation model that uses a Gaussian density function is put forward by density test and the analysis for the anomalies of E-commerce credit rating, it can be fond out the abnormal point in credit scoring, these points were calculated by nonlinear credit scoring algorithm, thus it can effectively improve the current E-commerce credit score, and enhance the accuracy of E-commerce credit score.