• Title/Summary/Keyword: generalizability analysis

Search Result 59, Processing Time 0.018 seconds

An Analysis of the Reliability of Group Assessment of Logical Thinking (GALT) using Generalizability Theory (일반화가능도 이론을 이용한 집단논리적사고력검사(GALT)의 신뢰도 분석)

  • Ryu, Chun-Ryol;Lee, Yong-Geun
    • Journal of the Korean earth science society
    • /
    • v.31 no.1
    • /
    • pp.95-105
    • /
    • 2010
  • The purpose of this study lies in applying generalizability theory depending on the aim of the usage of GALT to analyze the sources of error of single-facet considering item and person only and to analyze the sources of error of multi-facet considering item, person and domain. The study was conducted with 1016 students of local elementary, middle, and high schools. The 21 items of a full version were answered for 40 minute and then the 12 items of short version were sampled to analyze reliability using generalizability theory. Both the full version and the short version of the items were analyzed using Cronbach's alpha for data analysis, and we applied generalizability theory and separate $p{\times}i$ design and $p{\times}(i:h)$ design, G study and D study were performed. Results of analysis are as follows: First, the result of D study after $p{\times}I$ design both on the full version and the short version showed that in the case of the full version, the generalizability coefficient was 0.87 exceeding a normal level of 0.80, and the normal level of generalizability coefficient was achieved in 13 items as well. In case of short version, when 12 items were evaluated, generalizability coefficient was 0.77 not reaching the normal level, and the normal level was achieved in case of more than 15 items. Second, the result of D study after $p{\times}(I:H)$ design on the short version showed that once one domain consists of 2 items in 6 domains, generalizability coefficient was 0.71 which is lower than the normal level of 0.80, the normal level was achieved in more than 5 item cases.

Analysis of weights depending on scoring domains of the mathematical creativity test (수학적 창의성 검사의 채점 영역별 가중치 분석)

  • Kim, Sungyeun
    • The Mathematical Education
    • /
    • v.55 no.2
    • /
    • pp.147-169
    • /
    • 2016
  • This study analyzes the mathematical creativity test as an illustrative example with scoring domains of fluency, flexibility and originality in order to make suggestions for obtaining maximum reliability based on a composite score depending on combinations of each scoring domain weights. This is done by performing a multivariate generalizability analysis on the test scores, which were allowed to access publicly, of 30 mathematically gifted elementary school students, and therefore error variances, generalizability coefficients, and effective weights have been calculated. The main results were as follows. First, the optimal weights should adjust to .5, .4, and .1 based on the maximum generalizability coefficient even though the original weights in the mathematical creativity test were equal for each scoring domain with fluency, flexibility and originality. Second, the mathematical creativity test using the three scoring domains of fluency, flexibility, and originality showed higher reliability than using one scoring domain such as fluency. These results are limited to the mathematical creativity test used in this study. However, the methodology applied in this study can help determine the optimal weights depending on each scoring domain when the tests constructed in various researchers or educational fields are composed of multiple scoring domains.

Exploring the Application of Generalizability Theory to Mathematics Teacher Evaluation for Professional Development in Korea Based on the Analysis of Instructional Quality Assessment of Mathematics Teachers in the U.S. (미국 수학교사의 교수 질 평가도구 분석을 통한 우리나라 수학 교원능력개발평가에서의 일반화가능도 이론 활용성 탐색)

  • Kim, Sungyeun
    • Communications of Mathematical Education
    • /
    • v.28 no.4
    • /
    • pp.431-455
    • /
    • 2014
  • The purpose of this study was to suggest methods to apply generalizability theory to mathematics teacher evaluation using classroom observations in Korea by analysing mathematics teachers in the U.S. using the instructional quality of assessment instrument as an illustrative example. The subjects were 96 teachers participating in Year 3 and Year 4 from the Middle-school Mathematics and the Institutional Setting of Teaching (MIST) project funded by the National Science Foundation since 2007. The MIST project investigates the following question: What does it takes to support mathematics teachers' development of ambitious and equitable instructional practices on a large scale (MIST, 2007). This study examined data based on both the univariate generalizability analysis using GENOVA program and the multivariate generalizability analysis using mGENOVA program. Specifically, this study determined the relative effects of each error source and investigated optimal measuring conditions to obtain the suitable generalizability coefficients. The methodology applied in this study can be utilized to find effective optimal measurement conditions for the mathematics teacher evaluation for professional development in Korea. Finally, this study discussed limitations of the results and suggested directions for future research.

Analysis of Error Source in Subjective Evaluation on Patient Dentist Interaction : Application of Generalizability Theory (환자-치과의사 관계(PDI Patient Dentist Interaction) 평가의 오차원 분석: 일반화가능도 이론 적용)

  • Kim, Jooah;Cho, Lee-Ra
    • The Journal of the Korean dental association
    • /
    • v.57 no.8
    • /
    • pp.448-455
    • /
    • 2019
  • This study aims to apply the Generalizability Theory (G-theory) for estimation of reliability of evaluation scores between raters on Patient Dentist Interaction. Selecting a number of raters as multiple error sources, this study was analyzed the error sources caused by relative magnitude of error variances of interaction between the factors and proceeded with D-study based on the results of G-study for optimal determination of measurement condition. The estimated outcomes of variance component for accuracy among the Patient Dentist Interaction evaluation with G-theory showed that impact of error was the biggest influence factor in students. The second influence was the item effect, and the rater effect was relatively small. The Generalizability coefficients for case1 and case2 which were estimated through the D- study were calculated relatively low.

  • PDF

Analysis of Korea Earth Science Olympiad Items for the Enhancement of Item Quality (한국 지구과학 올림피아드 문항 분석을 통한 문항의 질 향상 방안)

  • Lee Ki-Young;Kim Chan-Jong
    • Journal of the Korean earth science society
    • /
    • v.26 no.6
    • /
    • pp.511-523
    • /
    • 2005
  • The purpose of this study is to analyze the 1st and 2nd Korea Earth Science Olympiad (KESO) items, in order to find informations to enhance item quality. To do this, internal and external item classification frameworks are developed. Item difficulty (P), discrimination index (DI), correlation, and reliability are estimated by using classical test theory. Generalizability is also estimated by applying the generalizability theory. The results of item classification are as follows: (1) ‘Geology’, ‘astronomy’ and ‘data analysis and interpretation’ are dominant in content and inquiry process domain, respectively. Nearly every item has textbook context. (2) There is no difference between the preliminary and final tests in terms of their thinking skills sections. (3) As a whole, the ratio of items with pictures is high in item representation. However, multiple-choice and short answer items are more common in preliminary competition, and essay type items are found more often in final competition. The ratio of simple items is high in middle school section and preliminary competition, but composite items are dominant in high school section and final competition. The findings of item analysis are as follows: (1) In the middle school section, P is low and DI is moderate. But in the high school section, there is a considerable differences between science high schools and other high schools in general. (2) The highest correlation is reported between the scores of meteorology domain and total score in middle school, whereas in high school astronomy domain and total score show the highest correlation. (3) General high school section show the highest Cronbach $\alpha$ and generalizability. (4) General high school section show acceptable generalizability coefficient (> 0.80), but middle and science high school section should increase the number of items to reach acceptable generalizability level.

Analysis of Assessment Types, Scoring Methods and Reliability of Science Performance Assessment in Middle and High School (중등학교 과학 수행평가의 평가 유형과 채점 방식 및 신뢰도 분석)

  • Lee, Ki-Young;An, Hui-Soo
    • Journal of The Korean Association For Science Education
    • /
    • v.25 no.2
    • /
    • pp.173-183
    • /
    • 2005
  • In this study, we questioned what assessment types and scoring methods of science performance assessment(SPA) were being used in middle and high school, and how much these SPA scores were reliable(generalizable). To answer these questions, SPA data obtained from the seven schools were classified according to assessment type and scoring method. Based upon this classification, we analyzed the reliability by applying generalizability theory. The result, from the classification of assessment type and scoring method, showed that SPA types of the seven schools were divided into two types: paper-pencil type and task type. Paper-pencil type included answer(content)-restricted essay-type test solely. Task type has two parts: process and outcome assessment. As the results of analyzing scoring methods of the seven schools, there were two cases in the way of scoring methods: one case is scoring all essay-type items and performance tasks by one teacher, the other is scoring assigned performance tasks by two teachers. But the case of scoring assigned essay-type items or the case of cross scoring by two or more teachers were not found. The findings of the reliability analysis are as follows: (1) Effect of essay-type item to SPA score was larger than that of performance task. (2) There was remarkable difference among the seven schools' interaction effect of person and rater in scoring performance tasks. (3) Most of generalizability(reliability) coefficients of SPA for the seven schools were smaller than the acceptable generalizability coefficient(0.80). Therefore, the population of statistical parameters such as number of item, task and rater, should be increased for approaching the acceptable generalizability level.

An Analysis of Measurement Equivalence in a Teaching Aptitude and Personality Test for Pre-service Mathematics Teachers between a Graduate School of Education and a College of Education (교육대학원과 사범대학 예비수학교사의 교직 적성·인성 검사에 대한 측정의 동등성 분석)

  • Kim, Sungyeun
    • The Mathematical Education
    • /
    • v.57 no.2
    • /
    • pp.179-196
    • /
    • 2018
  • The purpose of this study was to investigate the measurement equivalence and to suggest application ways in teaching aptitude and personality test results for pre-service mathematics teachers between a graduate school of education and a college of education. This study analyzed the scores of the teaching aptitude and personality test of 36 pre-service mathematics teachers enrolled in a graduate school of education and 111 pre-service mathematics teachers in a college of education by performing a multivariate generalizability analysis. The main results were as follows. First, graduate's pre-service mathematics teachers had a higher level of teaching aptitude and personality than that of college's pre-service mathematics teachers based on the total scores. In addition, graduate's pre-service mathematics teachers had higher levels of teaching aptitude and personality than those of college's pre-service mathematics teachers except for a creativity application domain based on the sub-domain scores. Second, cognitive domains were measured more precisely but affective domains were measured less precisely for graduate's pre-service mathematics teachers than for college's pre-service mathematics teachers. Third, regardless of school levels, Cronbach's ${\alpha}$ values, which might be overestimated by applying the classical test theory, were higher than dependability coefficients. Fourth, this study showed a somewhat negative result in ensuring the measurement equivalence for a problem solving exploration domain. However, regardless of school levels, this study indicated that the overall measurement was generally reliable on composite scores. Based on these results, it was confirmed that multivariate generalizability methodologies' approach can be useful for exploring the measurement equivalence issues. Finally, this study suggests how to utilize the results of the test, how to apply a multivariate generalizability analysis for detecting the measurement equivalence, and how to develop future research based on limitations.

An Analysis of Error Sources and Reliability Estimation in Emotional Intelligence Assessment of Young Children Using Generalizability Theory (일반화가능도 이론을 활용한 유아정서지능 평가도구의 오차요인 분석)

  • Kim, Kyung-Chul;Choi, Younchul
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.18 no.1
    • /
    • pp.565-571
    • /
    • 2017
  • The purpose of this study is to determine the effects of error sources in emotional intelligence assessment of young children. The emotional intelligence of 198 five years old children was assessed using the Emotional Intelligence Scale developed by Lee[14]. Evaluation results were analyzed using G study for generalizability theory. G study results show that parents can be effective evaluators for emotional intelligence in young children. Strategies to reduce error effects are discussed.

An analysis of error sources and reliability estimation in self-control assessment of young children using generalizability theory (일반화가능도 이론 적용 가능성 탐색을 위한 유아 자기통제력 평가도구의 신뢰도와 오차요인 분석)

  • Choi, Younchul;Kim, Kyung-Chul
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.17 no.11
    • /
    • pp.507-512
    • /
    • 2016
  • The purpose of this study is to determine the error sources and effects of each error source in self-control assessment of young children. The self-control of 259 five-year-old children was assessed using the Self-Control Rating Scale(SCRS) developed by Kendall and Wilcox[1]. The evaluation results were analyzed using G study of generalizability theory. The results of G study shows that parents can be effective evaluators for the self-control assessment of young children. The strategies to reduce the effects of error are also discussed.

Reliability of Delphi survey for traditional knowledge on agricultural resources (생물자원 전통지식 추출을 위한 델파이조사의 신뢰성 연구)

  • Lee, Ki Hoon;Song, Mi-Jang;Kim, Hyun
    • Journal of the Korean Data and Information Science Society
    • /
    • v.26 no.4
    • /
    • pp.947-956
    • /
    • 2015
  • In the knowledge and information age, to discover and protect Intellectual Properties would be very important for their economic value as a major growth engine. This study evaluated the reliability of a Delphi survey conducted by experts to assess the value of agricultural resources knowledge obtained from literature reviews and field interviews. Delphi method is collecting the opinions of experts for several rounds repeatedly, in the next round the experts have chance to modify their opinion. Scores between two rounds are highly correlated and standard deviations are declined for second round to imply that some correction of their evaluations are made. To check reliability of Delphi survey of two rounds Cronbach's reliability coefficient and Generalizability coefficient are derived. The Cronbach alpha's supported the reliability of the method, but the Generalizability analysis revealed some unexpected results while checking the variance components of sources of measurement errors. Despite the increased reliability coefficients, the deviations between the raters are increased which means that additional rounds are required to get consensus, the goal of Delphi research.