• Title/Summary/Keyword: AnswerTree

Search Result 38, Processing Time 0.023 seconds

Prediction of Correct Answer Rate and Identification of Significant Factors for CSAT English Test Based on Data Mining Techniques (데이터마이닝 기법을 활용한 대학수학능력시험 영어영역 정답률 예측 및 주요 요인 분석)

  • Park, Hee Jin;Jang, Kyoung Ye;Lee, Youn Ho;Kim, Woo Je;Kang, Pil Sung
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.4 no.11
    • /
    • pp.509-520
    • /
    • 2015
  • College Scholastic Ability Test(CSAT) is a primary test to evaluate the study achievement of high-school students and used by most universities for admission decision in South Korea. Because its level of difficulty is a significant issue to both students and universities, the government makes a huge effort to have a consistent difficulty level every year. However, the actual levels of difficulty have significantly fluctuated, which causes many problems with university admission. In this paper, we build two types of data-driven prediction models to predict correct answer rate and to identify significant factors for CSAT English test through accumulated test data of CSAT, unlike traditional methods depending on experts' judgments. Initially, we derive candidate question-specific factors that can influence the correct answer rate, such as the position, EBS-relation, readability, from the annual CSAT practices and CSAT for 10 years. In addition, we drive context-specific factors by employing topic modeling which identify the underlying topics over the text. Then, the correct answer rate is predicted by multiple linear regression and level of difficulty is predicted by classification tree. The experimental results show that 90% of accuracy can be achieved by the level of difficulty (difficult/easy) classification model, whereas the error rate for correct answer rate is below 16%. Points and problem category are found to be critical to predict the correct answer rate. In addition, the correct answer rate is also influenced by some of the topics discovered by topic modeling. Based on our study, it will be possible to predict the range of expected correct answer rate for both question-level and entire test-level, which will help CSAT examiners to control the level of difficulties.

A Study on Variable Selection Bias in Data Mining Software Packages (데이터마이닝 패키지에서 변수선택 편의에 관한 연구)

  • 송문섭;윤영주
    • The Korean Journal of Applied Statistics
    • /
    • v.14 no.2
    • /
    • pp.475-486
    • /
    • 2001
  • 데이터마이닝 패키지에 구현된 분류나무 알고리즘 가운데 CART, CHAID, QUEST, C4.5에서 변수 선택법을 비교하였다. CART의 전체탐색법이 편의를 갖는다는 사실은 잘알려졌으며, 여기서는 상품화된 패키지들에서 이들 알고리즘의 편의와 선택력을 모의실험 연구를 통하여 비교하였다. 상용 패키지로는 CART, Enterprise Miner, AnswerTree, Clementine을 사용하였다. 본 논문의 제한된 모의실험 연구 결과에 의하면 C4.5와 CART는 모두 변수선택에서 심각한 편의를 갖고 있으며, CHAID와 QUEST는 비교적 안정된 결과를 보여주고 있었다.

  • PDF

Selection of an Optimal Algorithm for Prevention of Industrial Accidents (산업재해 예방을 위한 최적 알고리즘 선정)

  • Leem, Young-Moon;Hwang, Young-Seob
    • Proceedings of the Safety Management and Science Conference
    • /
    • 2005.11a
    • /
    • pp.328-331
    • /
    • 2005
  • 산업재해 통계분석의 커다란 목적은 각 산업별로 주 위험요인을 도출하고 이에 따른 안전교육의 실시 또는 안전장치 등을 보완함으로써 산업재해를 줄이거나 예방하는데 있다고 볼 수 있다. 그러나 일반 제조업이나 건설업 등에서는 아직까지도 정량적 위험성 평가 기법이 개발되어 있지 않은 실정이다. 따라서 효율적인 위험성 평가 기법의 개발이 필요하다. 본 연구에서는 데이터마이닝 기법을 이용한 산업재해 예방을 위한 최적 알고리즘 선정 방법을 제시한다.

  • PDF

A Feature Analysis of Industrial Accidents Using CHAID Algorithm (CHAID 알고리즘을 이용한 산업재해 특성분석)

  • Leem Young-Moon;Hwang Young-Seob
    • Journal of the Korea Safety Management & Science
    • /
    • v.7 no.5
    • /
    • pp.59-67
    • /
    • 2005
  • The main objective of the statistical analysis about industrial accidents is to find out what is the dangerous factor in its own industrial field so that it is possible to prevent or decrease the number of the possible accidents by educating those who work in the fields for safety tools. However, so far, there is no technique of quantitative evaluation on danger. Almost all previous researches as to industrial accidents have only relied on the frequency analysis such as the analysis of the constituent ratio on accidents. As an application of data mining technique, this paper presents analysis on the efficiency of the CHAID algorithm to classify types of industrial accidents data and thereby identifies potential weak points in accident risk grouping.

Improvement and Performance Analysis of Hybrid Anti-Collision Algorithm for Object Identification of Multi-Tags in RFID Systems (RFID 시스템에서 다중 태그 인식을 위한 하이브리드 충돌방지 알고리즘의 개선 및 성능 분석)

  • Choi, Tae-Jeong;Seo, Jae-Joon;Baek, Jang-Hyun
    • IE interfaces
    • /
    • v.22 no.3
    • /
    • pp.278-286
    • /
    • 2009
  • The anti-collision algorithms to identify a number of tags in real-time in RFID systems are divided into the anti-collision algorithms based on the Framed slotted ALOHA that randomly select multiple slots to identify the tags, and the anti-collision algorithms based on the Tree-based algorithm that repeat the questions and answer process to identify the tags. In the hybrid algorithm which is combined the advantages of these algorithms, tags are distributed over the frames by selecting one frame among them and then identified by using the Query tree frame by frame. In this hybrid algorithm, however, the time of identifying all tags may increase if many tags are concentrated in a few frames. In this study, to improve the performance of the hybrid algorithm, we suggest an improved algorithm that the tags select a specific group of frames based on the earlier bits of the tag ID so that the tags are distribute equally over the frames. By using the simulation and mathematical analysis, we show that the suggested algorithm outperforms traditional hybrid algorithm from the viewpoint of the number of queries per frame and the time of identifying all tags.

Development of an Expert System for Prevention of Industrial Accidents in Manufacturing Industries (제조업에서의 산업재해 예방을 위한 전문가 시스템 개발)

  • Leem Young-Moon;Choi Yo-Han
    • Journal of the Korea Safety Management & Science
    • /
    • v.8 no.1
    • /
    • pp.53-64
    • /
    • 2006
  • Many researches and analyses have been focused on industrial accidents in order to predict and reduce them. As a similar endeavor, this paper is to develop an expert system for prevention of industrial accidents. Although various previous studies have been performed to prevent industrial accidents, these studies only provide managerial and educational policies using frequency analysis and comparative analysis based on data from past industrial accidents. As an initial step for the purpose of this study, this paper provides a comparative analysis of 4 kinds of algorithms including CHAID, CART, C4.5, and QUEST. Decision tree algorithm is utilized to predict results using objective and quantified data as a typical technique of data mining. Enterprise Miner of SAS and Answer Tree of SPSS will be used to evaluate the validity of the results of the four algorithms. The sample for this work was chosen from 10,536 data related to manufacturing industries during three years$(2002\sim2004)$ in korea. The initial sample includes a range of different businesses including the construction and manufacturing industries, which are typically vulnerable to industrial accidents.

A Convex Layer Tree for the Ray-Shooting Problem (광선 슈팅 문제를 위한 볼록 레이어 트리)

  • Kim, Soo-Hwan
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.21 no.4
    • /
    • pp.753-758
    • /
    • 2017
  • The ray-shooting problem is to find the first intersection point on the surface of given geometric objects where a ray moving along a straight line hits. Since rays are usually given in the form of queries, this problem is typically solved as follows. First, a data structure for a collection of objects is constructed as preprocessing. Then, the answer for each query ray is quickly computed using the data structure. In this paper, we consider the ray-shooting problem about the set of vertical line segments on the x-axis. We present a new data structure called a convex layer tree for n vertical line segments given by input. This is a tree structure consisting of layers of convex hulls of vertical line segments. It can be constructed in O(n log n) time and O(n) space and is easy to implement. We also present an algorithm to solve each query in O(log n) time using this data structure.

Market Segmentation of Patient-Utilization in Oriental Medical Care and Western Medical Care (양.한방 의료서비스 이용환자의 시장 세분화에 관한 연구)

  • 이선희;조희숙;최은영;최귀선;채유미
    • Health Policy and Management
    • /
    • v.12 no.1
    • /
    • pp.125-143
    • /
    • 2002
  • The objectives of this study were analysis of patient\`s characteristics and market segmentation in oriental medical care and western medical care. This study focused on medical utilization using Anderson's health utilization model. The source of data was 1998 National Health and Nutrition Survey which Korean Institute For Health and Social Affairs carried out. A stratified multistage probability sampling design was used in this survey. The analysis was conducted using the statistical software package SPSS version 10.0 and Answer Tree 2.1 which is one of data mining methodology. The results were as follows ; 1) 44.9% of respondents reported visiting oriental medical center within recent two weeks. 3.4% of them used oriental medical care. The group of age, kind of disease and medical expenditure are associated with the difference western and oriental medical utilization rate. 2) There were several factors related to utilization of oriental medical care according to decision tree. Especially, important factors that patient chose his medical center were kinds of disease, kinds of common medical use, and expenditure. 3) in the results of CART analysis, market of oriental medical care were classified by seven categories. The major groups who have a preference for oriental medicine were those musculo-skeletal, cerebra-vascular disease, or chronic headache patients, and they had a preference fur oriental medical care in common use. These results show that oriental and western medical market were divided into various areas by market segmentation.

Factor Analysis on Injured People Using Data Mining Technique (데이터 마이닝 기법을 활용한 산업재해자들에 대한 요인분석)

  • Leem Young-Moon;Hwang Young-Seob;Choi Yo-Han
    • Journal of the Korea Safety Management & Science
    • /
    • v.7 no.4
    • /
    • pp.61-71
    • /
    • 2005
  • Many researches have been focused on the analysis of industry disasters in order to reduce them. As a similar endeavor, this paper provides a propensity analysis of injured people from various industries using classification and regression tree(CART), a data mining algorithm. The sample for this work was chosen from 25,157data related to various industries during one year ( $2003.2\sim2004.1$ ) at Kangwon-Do in Korea. For the purpose of this paper, eight independent variables (injured date, injured time, injured month, type of Injured person, continuous service period, sex, company size, age)are taken from injured person group. According to the analysis result, it is found that five out of the eight factors that are predicted as significant have salient effects. Factors of season, time/hour, day of the week, or month which disasters happened do not show any significant effect. This paper provides common features of injured people. The provided analysis result will be helpful as a starting point for root cause analysis and reduction of industry disasters and also for development of a guideline of safety management.

Identification of Subgroups with Lower Level of Stroke Knowledge Using Decision-tree Analysis (의사결정나무 분석기법을 이용한 뇌졸중 지식 취약군 규명)

  • Kim, Hyun Kyung;Jeong, Seok Hee;Kang, Hyun Cheol
    • Journal of Korean Academy of Nursing
    • /
    • v.44 no.1
    • /
    • pp.97-107
    • /
    • 2014
  • Purpose: This study was performed to explore levels of stroke knowledge and identify subgroups with lower levels of stroke knowledge among adults in Korea. Methods: A cross-sectional survey was used and data were collected in 2012. A national sample of 990 Koreans aged 20 to 74 years participated in this study. Knowledge of risk factors, warning signs, and first action for stroke were surveyed using face-to-face interviews. Descriptive statistics and decision tree analysis were performed using SPSS WIN 20.0 and Answer Tree 3.1. Results: Mean score for stroke risk factor knowledge was 7.7 out of 10. The least recognized risk factor was diabetes and four subgroups with lower levels of knowledge were identified. Score for knowledge of stroke warning signs was 3.6 out of 6. The least recognized warning sign was sudden severe headache and six subgroups with lower levels of knowledge were identified. The first action for stroke was recognized by 65.7 percent of participants and four subgroups with lower levels of knowledge were identified. Conclusion: Multi-faceted education should be designed to improve stroke knowledge among Korean adults, particularly focusing on subgroups with lower levels of knowledge and less recognition of items in this study.