• Title/Summary/Keyword: Generalizability

Search Result 118, Processing Time 0.025 seconds

A study on validity and reliability of students' evaluation (강의평가의 타당성과 신뢰성에 관한 연구 전주대학교 강의평가 결과를 중심으로)

  • Lee, Ki-Hoon
    • Journal of the Korean Data and Information Science Society
    • /
    • v.21 no.1
    • /
    • pp.87-98
    • /
    • 2010
  • This research deals the method to assess the validity and reliability of students' evaluation for lectures. Most papers for student's evaluation have focused the procedures for controlling the external effects, but this paper is trying to answer for "How reliable is the student rating?" An empirical study shows that the evaluations in Jeonju University have the fair validity and reliability. The generalizability theory is suggested to obtain the more comprehensive results rather than Cronbach's alpha to examine internal consistency.

Investigation of Various Reliability Indices of Pre-service Mathematics Teachers' Teaching Aptitude and Personality Test based on Setting Cut Scores (예비수학교사의 교직 적성·인성 검사에서 분할점수 변화에 따른 다양한 신뢰도 탐색)

  • Kim, Sungyeun
    • The Mathematical Education
    • /
    • v.57 no.1
    • /
    • pp.55-74
    • /
    • 2018
  • The purpose of this study is first to examine the relative influence of each error source and to investigate the optimal measurement conditions to ensure satisfactory multiple reliability coefficients based on the teaching aptitude and personality test for pre-service teachers. Participants were 33 students enrolled in mathematics education in a graduate school of education located in the Seoul metropolitan area from 2013 to 2017. The main results were as follows. First, the estimated variance due to residual was highest, followed by nesting of items within domains, graduate students, interactions of graduate students with domains, and domains. Second, total 96 items, with 12 domains containing 8 items in each domain, with cut score of 598, and original 210 items, with 14 domains containing 15 items in each domain, with cut scores of 615 or 716 were optimal measurement conditions to reach acceptable reliability levels based on the joint consideration of dependability coefficients, cut score dependability coefficients, adjusted dependability coefficients, and standard errors of measurement. Third, larger deviations between the arithmetic mean and the cut score indicated higher reliability coefficients of the test results. Finally, this study suggests ways for practitioners to consider how to apply generalizability theory for criterion-referenced tests and how to develop future research based on limitations.

Analysis of error source in subjective evaluation results on Taekwondo Poomsae: Application of generalizability theory (태권도 품새 경기의 주관적 평가결과의 오차원 분석: 일반화가능도 이론 적용)

  • Cho, Eun Hyung
    • Journal of the Korean Data and Information Science Society
    • /
    • v.27 no.2
    • /
    • pp.395-407
    • /
    • 2016
  • This study aims to apply the G-theory for estimation of reliability of evaluation scores between raters on Taekwondo Poomsae rating categories. Selecting a number of game days and raters as multiple error sources, we analyzed the error sources caused by relative magnitude of error variances of interaction between the factors and proceeded with D-study based on the results of G-study for optimal determination of measurement condition. The results showed below. The estimated outcomes of variance component for accuracy among the Taekwondo Poomsae categories with G-theory showed that impact of error was the biggest influence factor in raters conditions and in order of interaction in subjects and between subjects, also impact of variance component estimation error on expression category was the major influence factor in interaction and in order of the between subjects and raters. Finally, the result of generalizability coefficient estimation via D-study showed that measurement condition of optimal level depend on the number of raters was 8 persons of raters on accuracy category, and stable reliability on expression category was gained when the raters were 7 persons.

Multigroup Generalizability Analysis of Creative Attitude Scale-Korea for Mathematically Gifted and General Students in Middle Schools (수학적 창의성 태도 검사에서 수학영재와 일반학생의 다집단 일반화가능도 분석)

  • Kim, Sungyeun
    • Communications of Mathematical Education
    • /
    • v.31 no.1
    • /
    • pp.49-70
    • /
    • 2017
  • The purpose of this study was to investigate the relative influence of multiple error sources and to find optimal measurement conditions that obtain a desired level of reliability of a creative attitude test in mathematical creativity. This study analyzed the scores of the Creative Attitude Scale-Korea allowed to access publicly of 125 general students and 109 mathematically gifted students by performing a multivariate generalizability analysis. The main results were as follows. First, based on reliability, the Creative Attitude Scale-Korea was measured less precisely for mathematically gifted students. On the contrary, based on the conditional standard error of measurement, it was measured less precisely for general students. However, the Creative Attitude Scale-Korea showed strong reliability in both groups. Second, the optimal weights should adjust to .3, .3, .4 in mathematically gifted students and .4, .4, .2 in general students with three scoring components of divergent attitude, problem solving attitude, and convergent attitude based on the maximum reliability. Third, to approach desirable reliability, it is possible to use one component of divergent attitude in general students but three components of divergent attitude, problem solving attitude, and convergent attitude in mathematically gifted students. Finally this study proposed application plans for the Creative Attitude Scale-Korea and future directions of research.

Exploring the Reliability of an Assessment based on Automatic Item Generation Using the Multivariate Generalizability Theory (다변량일반화가능도 이론을 적용한 자동문항생성 기반 평가에서의 신뢰도 탐색)

  • Jinmin Chung;Sungyeun Kim
    • Journal of Science Education
    • /
    • v.47 no.2
    • /
    • pp.211-224
    • /
    • 2023
  • The purpose of this study is to suggest how to investigate the reliability of the assessment, which consists of items generated by automatic item generation using empirical example data. To achieve this, we analyzed the illustrative assessment data by applying the multivariate generalizability theory, which can reflect the design of responding to different items for each student and multiple error sources in the assessment score. The result of the G-study showed that, in most designs, the student effect corresponding to the true score of the classical test theory was relatively large after residual effects. In addition, in the design where the content domain was fixed, the ranking of students did not change depending on the item types or items. Similarly, in the design where the item format was fixed, the difficulty showed little variation depending on the content domains. The result of the D-study indicated that the original assessment data achieved a sufficient level of reliability. It was also found that higher reliability than the original assessment data could be obtained by reducing the number of items in the content domains of operation, geometry, and probability and statistics, or by assigning higher weights to the domains of letters and formulas, and function. The efficient measurement conditions presented in this study are limited to the illustrative assessment data. However, the method applied in this study can be utilized to determine the reliability and to find efficient measurement conditions for the various assessment situations using automatic item generation based on measurement traits.

A Comparative Study of a New Approach to Keyword Analysis: Focusing on NBC (키워드 분석에 대한 최신 접근법 비교 연구: 성경 코퍼스를 중심으로)

  • Ha, Myoungho
    • Journal of Digital Convergence
    • /
    • v.19 no.7
    • /
    • pp.33-39
    • /
    • 2021
  • This paper aims to analyze lexical properties of keyword lists extracted from NLT Old Testament Corpus(NOTC), NLT New Testament Corpus(NNTC), and The NLT Bible Corpus(NBC) and identify that text dispersion keyness is more effective than corpus frequency keyness. For this purpose, NOTC including around 570,000 running words and NNTC about 200,000 were compiled after downloading the files from NLT website of Bible Hub. Scott's (2020) WordSmith 8.0 was utilized to extract keyword lists through comparing a target corpus and a reference corpus. The result demonstrated that text dispersion keyness showed lexical properties of keyword lists better than corpus frequency keyness and that the former was a superior measure for generating optimal keyword lists to fully meet content-generalizability and content distinctiveness.

Understanding COVID-19 Vaccine Acceptance Intention: An Emotion-focused and Problem-focused Coping Perspective (코로나-19 백신 수용의도에 관한 연구: 정서 중심적 대처와 문제 중심적 대처 관점을 중심으로)

  • Yoo, Joon Woo;Park, Heejun
    • Journal of Korean Society for Quality Management
    • /
    • v.51 no.4
    • /
    • pp.643-662
    • /
    • 2023
  • Purpose: The purpose of this study was to understand an individuals' COVID-19 vaccine acceptance intention during the peak of the pandemic by utilizing the coping theory and technology threat avoidance theory (TTAT) as a framework. Specifically, we focused on understanding how inward and outward emotion-focused coping (EFC), such as psychological distancing and emotional support seeking, affect problem-focused behavior (PFC), which is vaccine acceptance. Furthermore, we investigate how the individuals' cognitive appraisal to- ward COVID-19, consisted of perceived threat and perceived avoidability act as an antecedent of EFC. Methods: A PLS-SEM analysis was conducted to find the causal relation between the variables. An online survey was conducted targeting vaccination recipients on April, 2021. Participants were asked about their perception toward the virus, their coping strategy, and vaccine acceptance intention. A total of 186 valid samples were collected and used for the analysis. Furthermore, to analyze the out-of-sample predictive power of the research model and ensure the generalizability of the results, a PLSpredict analysis was conducted. Results: The results of the PLS-SEM analysis show that perceived threat toward COVID-19 significantly affect an individuals' EFC strategy. Furthermore, both types of inward EFC (psychological distancing, wishful thinking) negatively affected vaccine acceptance intention. On the other hand, emotional support seeking, which is a type of outward EFC, positively affected vaccine acceptance. The result of the PLSpredict analysis confirms the generalizability of the PLS-SEM result. Conclusion: The results of our study could be utilized to decrease vaccine hesitancy and prevent global pandemics by accelerating and increasing vaccination. Our study provides several meaningful implications to researchers and practitioners regarding vaccine acceptance and threat coping behavior.

Key Principles of Clinical Validation, Device Approval, and Insurance Coverage Decisions of Artificial Intelligence

  • Seong Ho Park;Jaesoon Choi;Jeong-Sik Byeon
    • Korean Journal of Radiology
    • /
    • v.22 no.3
    • /
    • pp.442-453
    • /
    • 2021
  • Artificial intelligence (AI) will likely affect various fields of medicine. This article aims to explain the fundamental principles of clinical validation, device approval, and insurance coverage decisions of AI algorithms for medical diagnosis and prediction. Discrimination accuracy of AI algorithms is often evaluated with the Dice similarity coefficient, sensitivity, specificity, and traditional or free-response receiver operating characteristic curves. Calibration accuracy should also be assessed, especially for algorithms that provide probabilities to users. As current AI algorithms have limited generalizability to real-world practice, clinical validation of AI should put it to proper external testing and assisting roles. External testing could adopt diagnostic case-control or diagnostic cohort designs. A diagnostic case-control study evaluates the technical validity/accuracy of AI while the latter tests the clinical validity/accuracy of AI in samples representing target patients in real-world clinical scenarios. Ultimate clinical validation of AI requires evaluations of its impact on patient outcomes, referred to as clinical utility, and for which randomized clinical trials are ideal. Device approval of AI is typically granted with proof of technical validity/accuracy and thus does not intend to directly indicate if AI is beneficial for patient care or if it improves patient outcomes. Neither can it categorically address the issue of limited generalizability of AI. After achieving device approval, it is up to medical professionals to determine if the approved AI algorithms are beneficial for real-world patient care. Insurance coverage decisions generally require a demonstration of clinical utility that the use of AI has improved patient outcomes.

Multifaceted validity analysis of clinical skills test in the educational field setting (교육 현장에서 시행된 임상 술기 시험의 다면적 타당도 분석)

  • Han Chae;Min-jung Lee;Myung-Ho Kim;Kyuseok Kim;Eunbyul Cho
    • The Journal of Korean Medicine
    • /
    • v.45 no.1
    • /
    • pp.1-16
    • /
    • 2024
  • Introduction: The importance of clinical skills training in traditional Korean medicine education is increasingly emphasized. Since the clinical skills tests are high-stakes tests that determine success in national licensing exams, it is essential to develop reliable multifaceted analysis methods for clinical skills tests in actual education settings. In this study, we applied the multifaceted validity evaluation methods to the evaluation results of the cardiopulmonary resuscitation module to confirm the applicability and effectiveness of the methods. Methods: In this study, we used internal consistency, factor analysis, generalizability theory G-study and D-study, ANOVA, Kendall's tau, descriptive statistics, and other statistical methods to analyze the multidimensional validity of a cardiopulmonary resuscitation test in clinical education settings over the past three years. Results: The factor analysis and internal consistency analysis showed that the evaluation rubric had an unstable structure and low concordance. The G-study showed that the error of the clinical skills assessment was large due to the evaluator and unexpected errors. The D-study showed that the variance error of the evaluator should be significantly reduced to validate the evaluation. The ANOVA and Kendall's tau confirmed that evaluator heterogeneity was a problem. Discussion and Conclusion: Clinical skills tests should be continuously evaluated and managed for validity in two steps of pre-production and actual implementation. This study has presented specific methods for analyzing the validity of clinical skills training and testing in actual education settings. This study would contribute to the foundation for competency-based evidence-based education in practical clinical training.

Correlation analysis of linguistic factors in non-native Korean speech and proficiency evaluation (비원어민 한국어 말하기 숙련도 평가와 평가항목의 상관관계)

  • Yang, Seung Hee;Chung, Minhwa
    • Phonetics and Speech Sciences
    • /
    • v.9 no.3
    • /
    • pp.49-56
    • /
    • 2017
  • Much research attention has been directed to identify how native speakers perceive non-native speakers' oral proficiency. To investigate the generalizability of previous findings, this study examined segmental, phonological, accentual, and temporal correlates of native speakers' evaluation of L2 Korean proficiency produced by learners with various levels and nationalities. Our experiment results show that proficiency ratings by native speakers significantly correlate not only with rate of speech, but also with the segmental accuracies. The influence of segmental errors has the highest correlation with the proficiency of L2 Korean speech. We further verified this finding within substitution, deletion, insertion error rates. Although phonological accuracy was expected to be highly correlated with the proficiency score, it was the least influential measure. Another new finding in this study is that the role of pitch and accent has been underemphasized so far in the non-native Korean speech perception studies. This work will serve as the groundwork for the development of automatic assessment module in Korean CAPT system.