• 제목/요약/키워드: CHAID analysis

검색결과 46건 처리시간 0.026초

산업재해의 최적 예측모형을 위한 근사모형에 관한 연구 (A Study on Approximation Model for Optimal Predicting Model of Industrial Accidents)

  • 임영문;유창현
    • 대한안전경영과학회지
    • /
    • 제8권3호
    • /
    • pp.1-9
    • /
    • 2006
  • Recently data mining techniques have been used for analysis and classification of data related to industrial accidents. The main objective of this study is to compare algorithms for data analysis of industrial accidents and this paper provides an optimal predicting model of 5 kinds of algorithms including CHAID, CART, C4.5, LR (Logistic Regression) and NN (Neural Network) with ROC chart, lift chart and response threshold. Also, this paper provides an approximation model for an optimal predicting model based on NN. The approximation model provided in this study can be utilized for easy interpretation of data analysis using NN. This study uses selected ten independent variables to group injured people according to a dependent variable in a way that reduces variation. In order to find an optimal predicting model among 5 algorithms, a retrospective analysis was performed in 67,278 subjects. The sample for this work chosen from data related to industrial accidents during three years ($2002\;{\sim}\;2004$) in korea. According to the result analysis, NN has excellent performance for data analysis and classification of industrial accidents.

e-CRM에서 개인화 향상을 위한 의사결정나무 사용에 관한 연구 (Study on the Application of Decision Trees for Personalization based on e-CRM)

  • 양정희;한서정
    • 대한안전경영과학회지
    • /
    • 제5권3호
    • /
    • pp.107-119
    • /
    • 2003
  • Expectation and interest about e-CRM are rising for more efficient customer management in on-line including electronic commerce. The decision-making tree can be used usefully as the data mining technology for e-CRM. In this paper, the representative decision making techniques, CART, C4.5, CHAID analyzed the differences in personalization point of view with actuality customer data through an experiment. With these analysis data, it is proposed a new decision-making tree system that has big advantage in personalization techniques. Through new system, it can get following advantage. First, it can form superior model more qualitatively in personalization by adding individual's weight value. Second it can supply information personalized more to customer. Third, it can have high position about customer's loyalty than other site of similar types of business. Fourth, it can reduce expense that cost marketing and decision-making. Fifth, it becomes possible that know that customer through smooth communication with customer who use personalized service wants and make from goods or service's quality to more worth thing.

비신호 횡단보도에서의 어린이 횡단행태 분석 연구 (A Study on Analyzing Children's Crossing Behaviors on Non-signalized Crosswalk)

  • 이덕환;이윤석;김원호;이백진
    • 대한교통학회지
    • /
    • 제31권3호
    • /
    • pp.19-32
    • /
    • 2013
  • 본 연구의 목적은 그간 교통안전 정책에서 간과되어온 어린이 보행자 횡단행태에 대한 분석을 통해 향후 어린이 교통안전 정책을 개선하기 위한 기초자료를 제공함에 있다. 어린이 횡단행태 분석은 물리적 형태와 사고발생빈도가 상이한 어린이 보호구역에서의 횡단행태 및 패턴의 차이를 비교하는 부분에 초점을 두고 이루어졌다. 자료는 경기도 7개 초등학교 비신호 횡단보도를 대상으로 현장관찰과 비디오 녹화를 통해 수집하였으며, 통계분석, CHAID 알고리즘 분석, 통행 패턴비교를 실시하였다. 분석결과, 사고발생 빈도와 유의한 관계가 있는 횡단특성은 대기유무, 주의유무, 특이행동 유무 순으로 나타났다. 구체적으로 사고발생빈도가 낮은 지점에서 대기후 횡단하는 비율이 69.1%인 반면, 사고다발지점에서 대기를 하지 않고 횡단하는 경우가 83.6%로 상이함을 확인하였다. 횡단 전 대기 및 주의 정도는 횡단 시작부 보도폭이 넓고 학교 출구에서 횡단보도까지 거리가 일정규모 이상일 때 높게 나타났다. 한편, 횡단패턴과 사고발생빈도의 관계성은 뚜렷하게 나타나지 않았다. 향후 어린이 보호구역의 안전성 개선을 위해서는 대기 후 통행이 이루어질 수 있도록 각 보호구역에서의 어린이 횡단특성에 기반한 차별화된 맞춤형 접근이 긴요할 것으로 판단된다.

Decision Tree Analysis for Prediction Model of Poverty of The Older Population in South Korea

  • Lee, Soochang;Kim, Daechan
    • International Journal of Advanced Culture Technology
    • /
    • 제10권2호
    • /
    • pp.28-33
    • /
    • 2022
  • This study aims to investigate factors that affect elderly poverty based on a comprehensive and universal perspective, suggesting some alternatives for improving the poverty rate of the elderly. The comprehensive and universal approach to the poverty of the aged that this study attempts can give a better understanding of the elderly poverty beyond the contribution of the existing literature, with the research model including individual, family, labor, and income factors as the causes of old-age poverty from the comprehensive and universal perspective on the causes of poverty of the elderly. In addition, the study attempts to input variants of variables into the equation for the causes of elderly poverty by using panel data from the 8th Korean Retirement and Income Study. This study employs decision tree analysis to determine the cause of the poverty of the elderly using CHAID. The decision tree analysis shows that the most vital variable affecting elderly poverty is making income. For the poor elderly without earned income, public pensions, educational careers, and residential areas influence elderly poverty, but for the poor elderly with earned income, wage earners and gender are variables that affect poverty. This study suggests some alternatives to improve the poverty rate of the aged. The government should create a better working environment such as senior re-employment for old people to be able to participate in economic activities, improve public pension or social security for workers with unfavorable conditions for public security of old age, and give companies that create employment of the aged diverse incentives.

한방진단설문지 임상자료에 근거한 기혈음양 허증병기 의사결정규칙 연구 (A Study on Decision Rules for Qi·Blood·Yin·Yang Deficiency Pathogenic Factor Based on Clinical Data of Diagnosis System of Oriental Medicine)

  • 전수형;이인선;지규용;김종원;강창완;이용태
    • 동의생리병리학회지
    • /
    • 제37권6호
    • /
    • pp.172-177
    • /
    • 2023
  • In order to deduce the pathogenic factor(PF) diagnosis logic of underlying in pattern identification of Korean medicine, 2,072 cases of DSOM(Diagnosis System of Oriental Medicine) data from May 2005 to April 2022 were collected and analyzed by means of decision tree model(DTM). The entire data were divided into training data and validation data at a ratio of 7:3. The CHAID algorithm was used for analysis of DTM, and then validity was tested by applying the validation data. The decision rules of items and pathways determined from the diagnosis data of Qi Deficiency, Blood Deficiency, Yin Deficiency and Yang Deficiency Pathogenic Factor of DSOM were as follows. Qi Deficiency PF had 7 decision rules and used 5 questions: Q124, Q116a, Q119, Q119a, Q55. The primary indicators(PI) were 'lack of energy' and 'weary of talking'. Blood deficiency PF had 7 decision rules and used 6 questions: Q113, Q84, Q85, Q114, Q129, Q130. The PI were 'numbness in the limbs', 'dizziness when standing up', and 'frequent cramps'. Yin deficiency PF had 3 decision rules and used 2 questions: Q144 and Q56. The PI were 'subjective heat sensation from the afternoon to night' and 'heat sensation in the limbs'. Yang deficiency PF had 3 decision rules and used 3 questions: Q55, Q10, and Q102. The PI were 'sweating even with small movements' and 'lack of energy'. Conclusively, these rules and symptom information to decide the Qi·Blood·Yin·Yang Deficiency PF would be helpful for Korean medicine diagnostics.

대용량 자료에서 핵심적인 소수의 변수들의 선별과 로지스틱 회귀 모형의 전개 (Screening Vital Few Variables and Development of Logistic Regression Model on a Large Data Set)

  • 임용빈;조재연;엄경아;이선아
    • 품질경영학회지
    • /
    • 제34권2호
    • /
    • pp.129-135
    • /
    • 2006
  • In the advance of computer technology, it is possible to keep all the related informations for monitoring equipments in control and huge amount of real time manufacturing data in a data base. Thus, the statistical analysis of large data sets with hundreds of thousands observations and hundred of independent variables whose some of values are missing at many observations is needed even though it is a formidable computational task. A tree structured approach to classification is capable of screening important independent variables and their interactions. In a Six Sigma project handling large amount of manufacturing data, one of the goals is to screen vital few variables among trivial many variables. In this paper we have reviewed and summarized CART, C4.5 and CHAID algorithms and proposed a simple method of screening vital few variables by selecting common variables screened by all the three algorithms. Also how to develop a logistics regression model on a large data set is discussed and illustrated through a large finance data set collected by a credit bureau for th purpose of predicting the bankruptcy of the company.

데이터마이닝 기법을 이용한 전공이탈자 분류를 위한 성능평가 (Evaluation on Performance for Classification of Students Leaving Their Majors Using Data Mining Technique)

  • 임영문;유창현
    • 대한안전경영과학회:학술대회논문집
    • /
    • 대한안전경영과학회 2006년도 추계공동학술대회
    • /
    • pp.293-297
    • /
    • 2006
  • Recently most universities are suffering from students leaving their majors. In order to make a countermeasure for reducing major separation rate, many universities are trying to find a proper solution. As a similar endeavor, this paper uses decision tree algorithm which is one of the data mining techniques which conduct grouping or prediction into several sub-groups from interested groups. This technique can analyze a feature of type on students leaving their majors. The dataset consists of 5,115 features through data selection from total data of 13,346 collected from a university in Kangwon-Do during seven years(2000.3.1 $\sim$ 2006.6.30). The main objective of this study is to evaluate performance of algorithms including CHAID, CART and C4.5 for classification of students leaving their majors with ROC Chart, Lift Chart and Gains Chart. Also, this study provides values about accuracy, sensitivity, specificity using classification table. According to the analysis result, CART showed the best performance for classification of students leaving their majors.

  • PDF

데이터마이닝 기법을 이용한 전공이탈자 예측모형 (Predicting Model of Students Leaving Their Majors Using Data Mining Technique)

  • 임영문;유창현
    • 대한안전경영과학회지
    • /
    • 제8권5호
    • /
    • pp.17-25
    • /
    • 2006
  • Nowadays most colleges are confronting with a serious problem because many students have left their majors at the colleges. In order to make a countermeasure for reducing major separation rate, many universities are trying to find a proper solution. As a similar endeavor, the objective of this paper Is to find a predicting model of students leaving their majors. The sample for this study was chosen from a university in Kangwon-Do during seven years(2000.3.1 $\sim$ 2006. 6.30). In this study, the ratio of training sample versus testing sample among partition data was controlled as 50% : 50% for a validation test of data division. Also, this study provides values about accuracy, sensitivity, specificity about three kinds of algorithms including CHAID, CART and C4.5. In addition, ROC chart and gains chart were used for classification of students leaving their majors. The analysis results were very informative since those enable us to know the most important factors such as semester taking a course, grade on cultural subjects, scholarship, grade on majors, and total completion of courses which can affect students leaving their majors.

대기행렬이론을 활용한 의료서비스 환자 대기환경 평가 (Evaluation of Patients' Queue Environment on Medical Service Using Queueing Theory)

  • 여현진;박원숙;유명철;박상찬;이상철
    • 품질경영학회지
    • /
    • 제42권1호
    • /
    • pp.71-79
    • /
    • 2014
  • Purpose: The purpose of this study is to develop the methods for evaluating patients' queue environment using decision tree and queueing theory. Methods: This study uses CHAID decision tree and M/G/1 queueing theory to estimate pain point and patients waiting time for medical service. This study translates hospital physical data process to logical process to adapt queueing theory. Results: This study indicates that three nodes of the system has predictable problem with patients waiting time and can be improved by relocating patients to other nodes. Conclusion: This study finds out three seek points of the hospital through decision tree analysis and substitution nodes through the queueing theory. Revealing the hospital patients' queue environment, this study has several limitations such as lack of various case and factors.

온라인 주식 포럼의 핫토픽 탐지를 위한 감성분석 모형의 개발 (Development of Sentiment Analysis Model for the hot topic detection of online stock forums)

  • 홍태호;이태원;리징징
    • 지능정보연구
    • /
    • 제22권1호
    • /
    • pp.187-204
    • /
    • 2016
  • 소셜 미디어를 이용하는 사용자들이 직접 작성한 의견 혹은 리뷰를 이용하여 상호간의 교류 및 정보를 공유하게 되었다. 이를 통해 고객리뷰를 이용하는 오피니언마이닝, 웹마이닝 및 감성분석 등 다양한 연구분야에서의 연구가 진행되기 시작하였다. 특히, 감성분석은 어떠한 토픽(주제)를 기준으로 직접적으로 글을 작성한 사람들의 태도, 입장 및 감성을 알아내는데 목적을 두고 있다. 고객의 의견을 내포하고 있는 정보 혹은 데이터는 감성분석을 위한 핵심 데이터가 되기 때문에 토픽을 통한 고객들의 의견을 분석하는데 효율적이며, 기업에서는 소비자들의 니즈에 맞는 마케팅 혹은 투자자들의 시장동향에 따른 많은 투자가 이루어지고 있다. 본 연구에서는 중국의 온라인 시나 주식 포럼에서 사용자들이 직접 작성한 포스팅(글)을 이용하여 기존에 제시된 토픽들로부터 핫토픽을 선정하고 탐지하고자 한다. 기존에 사용된 감성 사전을 활용하여 토픽들에 대한 감성값과 극성을 분류하고, 군집분석을 통해 핫토픽을 선정하였다. 핫토픽을 선정하기 위해 k-means 알고리즘을 이용하였으며, 추가로 인공지능기법인 SOM을 적용하여 핫토픽 선정하는 절차를 제시하였다. 또한, 로짓, 의사결정나무, SVM 등의 데이터마이닝 기법을 이용하여 핫토픽 사전 탐지를 하는 감성분석을 위한 모형을 개발하여 관심지수를 통해 선정된 핫토픽과 탐지된 핫토픽을 비교하였다. 본 연구를 통해 핫토픽에 대한 정보 제공함으로써 최신 동향에 대한 흐름을 알 수 있게 되고, 주식 포럼에 대한 핫토픽은 주식 시장에서의 투자자들에게 유용한 정보를 제공하게 될 뿐만 아니라 소비자들의 니즈를 충족시킬 수 있을 것이라 기대된다.