• Title/Summary/Keyword: Automatic Scoring Model

Search Result 21, Processing Time 0.021 seconds

Exploring automatic scoring of mathematical descriptive assessment using prompt engineering with the GPT-4 model: Focused on permutations and combinations (프롬프트 엔지니어링을 통한 GPT-4 모델의 수학 서술형 평가 자동 채점 탐색: 순열과 조합을 중심으로)

  • Byoungchul Shin;Junsu Lee;Yunjoo Yoo
    • The Mathematical Education
    • /
    • v.63 no.2
    • /
    • pp.187-207
    • /
    • 2024
  • In this study, we explored the feasibility of automatically scoring descriptive assessment items using GPT-4 based ChatGPT by comparing and analyzing the scoring results between teachers and GPT-4 based ChatGPT. For this purpose, three descriptive items from the permutation and combination unit for first-year high school students were selected from the KICE (Korea Institute for Curriculum and Evaluation) website. Items 1 and 2 had only one problem-solving strategy, while Item 3 had more than two strategies. Two teachers, each with over eight years of educational experience, graded answers from 204 students and compared these with the results from GPT-4 based ChatGPT. Various techniques such as Few-Shot-CoT, SC, structured, and Iteratively prompts were utilized to construct prompts for scoring, which were then inputted into GPT-4 based ChatGPT for scoring. The scoring results for Items 1 and 2 showed a strong correlation between the teachers' and GPT-4's scoring. For Item 3, which involved multiple problem-solving strategies, the student answers were first classified according to their strategies using prompts inputted into GPT-4 based ChatGPT. Following this classification, scoring prompts tailored to each type were applied and inputted into GPT-4 based ChatGPT for scoring, and these results also showed a strong correlation with the teachers' scoring. Through this, the potential for GPT-4 models utilizing prompt engineering to assist in teachers' scoring was confirmed, and the limitations of this study and directions for future research were presented.

Performance Evaluation of Nonkeyword Modeling and Postprocessing for Vocabulary-independent Keyword Spotting (가변어휘 핵심어 검출을 위한 비핵심어 모델링 및 후처리 성능평가)

  • Kim, Hyung-Soon;Kim, Young-Kuk;Shin, Young-Wook
    • Speech Sciences
    • /
    • v.10 no.3
    • /
    • pp.225-239
    • /
    • 2003
  • In this paper, we develop a keyword spotting system using vocabulary-independent speech recognition technique, and investigate several non-keyword modeling and post-processing methods to improve its performance. In order to model non-keyword speech segments, monophone clustering and Gaussian Mixture Model (GMM) are considered. We employ likelihood ratio scoring method for the post-processing schemes to verify the recognition results, and filler models, anti-subword models and N-best decoding results are considered as an alternative hypothesis for likelihood ratio scoring. We also examine different methods to construct anti-subword models. We evaluate the performance of our system on the automatic telephone exchange service task. The results show that GMM-based non-keyword modeling yields better performance than that using monophone clustering. According to the post-processing experiment, the method using anti-keyword model based on Kullback-Leibler distance and N-best decoding method show better performance than other methods, and we could reduce more than 50% of keyword recognition errors with keyword rejection rate of 5%.

  • PDF

Automated Scoring of Argumentation Levels and Analysis of Argumentation Patterns Using Machine Learning (기계 학습을 활용한 논증 수준 자동 채점 및 논증 패턴 분석)

  • Lee, Manhyoung;Ryu, Suna
    • Journal of The Korean Association For Science Education
    • /
    • v.41 no.3
    • /
    • pp.203-220
    • /
    • 2021
  • We explored the performance improvement method of automated scoring for scientific argumentation. We analyzed the pattern of argumentation using automated scoring models. For this purpose, we assessed the level of argumentation for student's scientific discourses in classrooms. The dataset consists of four units of argumentation features and argumentation levels for episodes. We utilized argumentation clusters and n-gram to enhance automated scoring accuracy. We used the three supervised learning algorithms resulting in 33 automatic scoring models. As a result of automated scoring, we got a good scoring accuracy of 77.59% on average and up to 85.37%. In this process, we found that argumentation cluster patterns could enhance automated scoring performance accuracy. Then, we analyzed argumentation patterns using the model of decision tree and random forest. Our results were consistent with the previous research in which justification in coordination with claim and evidence determines scientific argumentation quality. Our research method suggests a novel approach for analyzing the quality of scientific argumentation in classrooms.

Semi-Automatic Scoring for Short Korean Free-Text Responses Using Semi-Supervised Learning (준지도학습 방법을 이용한 한국어 서답형 문항 반자동 채점)

  • Cheon, Min-Ah;Seo, Hyeong-Won;Kim, Jae-Hoon;Noh, Eun-Hee;Sung, Kyung-Hee;Lim, EunYoung
    • Korean Journal of Cognitive Science
    • /
    • v.26 no.2
    • /
    • pp.147-165
    • /
    • 2015
  • Through short-answer questions, we can reflect the depth of students' understanding and higher-order thinking skills. Scoring for short-answer questions may take long time and may be an issue on consistency of grading. To alleviate such the suffering, automated scoring systems are widely used in Europe and America, but are in the initial stage in research in Korea. In this paper, we propose a semi-automatic scoring system for short Korean free-text responses using semi-supervised learning. First of all, based on the similarity score between students' answers and model answers, the proposed system grades students' answers and the scored answers with high reliability have been included in the model answers through the thorough test. This process repeats until all answers are scored. The proposed system is used experimentally in Korean and social studies in Nationwide Scholastic Achievement Test. We have confirmed that the processing time and the consistency of grades are promisingly improved. Using the system, various assessment methods have got to be developed and comparative studies need to be performed before applying to school fields.

Hybrid Fungal Genome Annotation Pipeline Combining ab initio, Evidence-, and Homology-based gene model evaluation

  • Min, Byoungnam;Choi, In-Geol
    • 한국균학회소식:학술대회논문집
    • /
    • 2018.05a
    • /
    • pp.22-22
    • /
    • 2018
  • Fungal genome sequencing and assembly have been trivial in these days. Genome analysis relies on high quality of gene prediction and annotation. Automatic fungal genome annotation pipeline is essential for handling genomic sequence data accumulated exponentially. However, building an automatic annotation procedure for fungal genomes is not an easy task. FunGAP (Fungal Genome Annotation Pipeline) is developed for precise and accurate prediction of gene models from any fungal genome assembly. To make high-quality gene models, this pipeline employs multiple gene prediction programs encompassing ab initio, evidence-, and homology-based evaluation. FunGAP aims to evaluate all predicted genes by filtering gene models. To make a successful filtering guide for removal of false-positive genes, we used a scoring function that seeks for a consensus by estimating each gene model based on homology to the known proteins or domains. FunGAP is freely available for non-commercial users at the GitHub site (https://github.com/CompSynBioLab-KoreaUniv/FunGAP).

  • PDF

A Novel, Deep Learning-Based, Automatic Photometric Analysis Software for Breast Aesthetic Scoring

  • Joseph Kyu-hyung Park;Seungchul Baek;Chan Yeong Heo;Jae Hoon Jeong;Yujin Myung
    • Archives of Plastic Surgery
    • /
    • v.51 no.1
    • /
    • pp.30-35
    • /
    • 2024
  • Background Breast aesthetics evaluation often relies on subjective assessments, leading to the need for objective, automated tools. We developed the Seoul Breast Esthetic Scoring Tool (S-BEST), a photometric analysis software that utilizes a DenseNet-264 deep learning model to automatically evaluate breast landmarks and asymmetry indices. Methods S-BEST was trained on a dataset of frontal breast photographs annotated with 30 specific landmarks, divided into an 80-20 training-validation split. The software requires the distances of sternal notch to nipple or nipple-to-nipple as input and performs image preprocessing steps, including ratio correction and 8-bit normalization. Breast asymmetry indices and centimeter-based measurements are provided as the output. The accuracy of S-BEST was validated using a paired t-test and Bland-Altman plots, comparing its measurements to those obtained from physical examinations of 100 females diagnosed with breast cancer. Results S-BEST demonstrated high accuracy in automatic landmark localization, with most distances showing no statistically significant difference compared with physical measurements. However, the nipple to inframammary fold distance showed a significant bias, with a coefficient of determination ranging from 0.3787 to 0.4234 for the left and right sides, respectively. Conclusion S-BEST provides a fast, reliable, and automated approach for breast aesthetic evaluation based on 2D frontal photographs. While limited by its inability to capture volumetric attributes or multiple viewpoints, it serves as an accessible tool for both clinical and research applications.

A Study on the Appraisal of Site - on focus I-Evaluation model- (비즈니스 사이트 평가에 관한 연구 - I-Evaluation 모형 중심으로-)

  • 양승권
    • Journal of the Korea Safety Management & Science
    • /
    • v.3 no.3
    • /
    • pp.151-164
    • /
    • 2001
  • Currently, there are little evaluated model for each industry and systematic analysis for each item about current business web sites. And the approach way to improve it all depends on individual. This research is about two different points of view on I-Evaluation development as an approach method to provide analysis, evaluation, and guideline on business web sites. The one is about developing a working step and a site evaluation model that are necessary to improve the quality of site. The other is about a framework development to apply a feedback on site most rapidly and site optimization. The former is from the methodological point of view for I-Evaluation, and the latter is from the point of view for I-Evaluation Framework. In terms of methodology, developing site evaluation model and defining a working step belong to it. Site evaluation model means customizing each customers web site, using each evaluated scoring model which can be a standard for each industry to analyze a similar business web site. Defining a working step means defining input and output parameters for composed elements, working processes, and results analysis on an evaluated model. And also it includes building a working environments to automatic steps mentioned the above by clarifying them.

  • PDF

Development of English Speech Recognizer for Pronunciation Evaluation (발성 평가를 위한 영어 음성인식기의 개발)

  • Park Jeon Gue;Lee June-Jo;Kim Young-Chang;Hur Yongsoo;Rhee Seok-Chae;Lee Jong-Hyun
    • Proceedings of the KSPS conference
    • /
    • 2003.10a
    • /
    • pp.37-40
    • /
    • 2003
  • This paper presents the preliminary result of the automatic pronunciation scoring for non-native English speakers, and shows the developmental process for an English speech recognizer for the educational and evaluational purposes. The proposed speech recognizer, featuring two refined acoustic model sets, implements the noise-robust data compensation, phonetic alignment, highly reliable rejection, key-word and phrase detection, easy-to-use language modeling toolkit, etc., The developed speech recognizer achieves 0.725 as the average correlation between the human raters and the machine scores, based on the speech database YOUTH for training and K-SEC for test.

  • PDF

Automatic Inter-Phoneme Similarity Calculation Method Using PAM Matrix Model (PAM 행렬 모델을 이용한 음소 간 유사도 자동 계산 기법)

  • Kim, Sung-Hwan;Cho, Hwan-Gue
    • The Journal of the Korea Contents Association
    • /
    • v.12 no.3
    • /
    • pp.34-43
    • /
    • 2012
  • Determining the similarity between two strings can be applied various area such as information retrieval, spell checker and spam filtering. Similarity calculation between Korean strings based on dynamic programming methods firstly requires a definition of the similarity between phonemes. However, existing methods have a limitation that they use manually set similarity scores. In this paper, we propose a method to automatically calculate inter-phoneme similarity from a given set of variant words using a PAM-like probabilistic model. Our proposed method first finds the pairs of similar words from a given word set, and derives derivation rules from text alignment results among the similar word pairs. Then, similarity scores are calculated from the frequencies of variations between different phonemes. As an experimental result, we show an improvement of 10.1%~14.1% and 8.1%~11.8% in terms of sensitivity compared with the simple match-mismatch scoring scheme and the manually set inter-phoneme similarity scheme, respectively, with a specificity of 77.2%~80.4%.

SWAT model calibration/validation using SWAT-CUP I: analysis for uncertainties of objective functions (SWAT-CUP을 이용한 SWAT 모형 검·보정 I: 목적함수에 따른 불확실성 분석)

  • Yu, Jisoo;Noh, Joonwoo;Cho, Younghyun
    • Journal of Korea Water Resources Association
    • /
    • v.53 no.1
    • /
    • pp.45-56
    • /
    • 2020
  • This study aims to quantify the uncertainty that can be induced by the objective function when calibrating SWAT parameters using SWAT-CUP. SWAT model was constructed to estimate runoff in Naesenong-cheon, which is the one of mid-watershed in Nakdong River basin, and then automatic calibration was performed using eight objective functions (R2, bR2, NS, MNS, KGE, PBIAS, RSR, and SSQR). The optimum parameter sets obtained from each objective function showed different ranges, and thus the corresponding hydrologic characteristics of simulated data were also derived differently. This is because each objective function is sensitive to specific hydrologic signatures and evaluates model performance in an unique way. In other words, one objective function might be sensitive to the residual of the extreme value, so that well produce the peak value, whereas ignores the average or low flow residuals. Therefore, the hydrological similarity between the simulated and measured values was evaluated in order to select the optimum objective function. The hydrologic signatures, which include not only the magnitude, but also the ratio of the inclining and declining time in hydrograph, were defined to consider the timing of the flow occurrence, the response of watershed, and the increasing and decreasing trend. The results of evaluation were quantified by scoring method, and hence the optimal objective functions for SWAT parameter calibration were determined as MNS (342.48) and SSQR (346.45) with the highest total scores.