• Title/Summary/Keyword: data mining(CART)

Search Result 68, Processing Time 0.023 seconds

Identification of major risk factors association with respiratory diseases by data mining (데이터마이닝 모형을 활용한 호흡기질환의 주요인 선별)

  • Lee, Jea-Young;Kim, Hyun-Ji
    • Journal of the Korean Data and Information Science Society
    • /
    • v.25 no.2
    • /
    • pp.373-384
    • /
    • 2014
  • Data mining is to clarify pattern or correlation of mass data of complicated structure and to predict the diverse outcomes. This technique is used in the fields of finance, telecommunication, circulation, medicine and so on. In this paper, we selected risk factors of respiratory diseases in the field of medicine. The data we used was divided into respiratory diseases group and health group from the Gyeongsangbuk-do database of Community Health Survey conducted in 2012. In order to select major risk factors, we applied data mining techniques such as neural network, logistic regression, Bayesian network, C5.0 and CART. We divided total data into training and testing data, and applied model which was designed by training data to testing data. By the comparison of prediction accuracy, CART was identified as best model. Depression, smoking and stress were proved as the major risk factors of respiratory disease.

A Study of The Determinants of Turnover Intention and Organizational Commitment by Data Mining (데이터마이닝을 활용한 이직의도와 조직몰입의 결정요인에 대한 연구)

  • Choi, Young Joon;Shim, Won Shul;Baek, Seung Hyun
    • Journal of the Korea Society for Simulation
    • /
    • v.23 no.1
    • /
    • pp.21-31
    • /
    • 2014
  • In this article, data mining simulation is applied to find a proper approach and results of analysis for study of variables related to organization. Also, turnover intention and organizational commitment are used as target (dependent) variables in this simulation. Classification and regression tree (CART) with ensemble methods are used in this study for simulation. Human capital corporate panel data of Korea Research Institute for Vocation Education & Training (KRIVET) is used. The panel data is collected in 2005, 2007, and 2009. Organizational commitment variables are analyzed with combined measure variables which are created after investigation of reliability and single dimensionality for multiple-item measurement details. The results of this study are as follows. First, major determinants of turnover intention are trust, communication, and talent management-oriented trend. Second, the main determining factors for organizational commitment are trust, the number of years worked, innovation, communication. CART with ensemble methods has two ensemble CART methods which are CART with Bagging and CART with Arcing. Comparing two methods, CART with Arcing (Arc-x4) extracted scenarios with very high coefficients of determination. In this study, a scenario with maximum coefficient of determinant and minimum error is obtained and practical implications are presented. Using one of data mining methods, CART with ensemble method. Also, the limitation and future research are discussed.

Cloud Computing Adoption Decision-Making Modeling Using CART (CART 방법론을 사용한 클라우드 컴퓨팅 도입 의사 결정 모델링)

  • Baek, Seung Hyun;Chang, Byeong-Yun
    • Journal of the Korea Society for Simulation
    • /
    • v.23 no.4
    • /
    • pp.189-195
    • /
    • 2014
  • In this paper, we conducted a study on place-free and time-free cloud computing (CC) adoption decision-making model. Panel survey data which is collected from 65 people and CART (classification and regression tree) which is one of data mining approaches are used to construct decision-making model. In this modeling, there are 2 steps: In the first step, significant questions (variables) are selected. After that, the CART decision-making model is constructed using the selected variables. In the variable selection stage, the 25 questions are reduced to 5 ones. The benefits of question reduction are quick response from respondent and reducing model-construction time.

The Study of Chronic Kidney Disease Classification using KHANES data (국민건강영양조사 자료를 이용한 만성신장질환 분류기법 연구)

  • Lee, Hong-Ki;Myoung, Sungmin
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2020.01a
    • /
    • pp.271-272
    • /
    • 2020
  • Data mining is known useful in medical area when no availability of evidence favoring a particular treatment option is found. Huge volume of structured/unstructured data is collected by the healthcare field in order to find unknown information or knowledge for effective diagnosis and clinical decision making. The data of 5,179 records considered for analysis has been collected from Korean National Health and Nutrition Examination Survey(KHANES) during 2-years. Data splitting, referred as the training and test sets, was applied to predict to fit the model. We analyzed to predict chronic kidney disease (CKD) using data mining method such as naive Bayes, logistic regression, CART and artificial neural network(ANN). This result present to select significant features and data mining techniques for the lifestyle factors related CKD.

  • PDF

A Study on Variable Selection Bias in Data Mining Software Packages (데이터마이닝 패키지에서 변수선택 편의에 관한 연구)

  • 송문섭;윤영주
    • The Korean Journal of Applied Statistics
    • /
    • v.14 no.2
    • /
    • pp.475-486
    • /
    • 2001
  • 데이터마이닝 패키지에 구현된 분류나무 알고리즘 가운데 CART, CHAID, QUEST, C4.5에서 변수 선택법을 비교하였다. CART의 전체탐색법이 편의를 갖는다는 사실은 잘알려졌으며, 여기서는 상품화된 패키지들에서 이들 알고리즘의 편의와 선택력을 모의실험 연구를 통하여 비교하였다. 상용 패키지로는 CART, Enterprise Miner, AnswerTree, Clementine을 사용하였다. 본 논문의 제한된 모의실험 연구 결과에 의하면 C4.5와 CART는 모두 변수선택에서 심각한 편의를 갖고 있으며, CHAID와 QUEST는 비교적 안정된 결과를 보여주고 있었다.

  • PDF

Analysis of employee's satisfaction factor in working environment using data mining algorithm (데이터 마이닝 기법을 이용한 피고용자의 근로환경 만족도 요인 분석)

  • Lee, Dong Ryeol;Kim, Tae Ho;Lee, HongChul
    • Journal of the Korea Safety Management & Science
    • /
    • v.16 no.4
    • /
    • pp.275-284
    • /
    • 2014
  • Decision Tree is one of analysis techniques which conducts grouping and prediction into several sub-groups from interested groups. Researcher can easily understand this progress and explain than other techniques. Because Decision Tree is easy technique to see results. This paper uses CART algorithm which is one of data mining technique. It used 273 variables and 70094 data(2010-2011) of working environment survey conducted by Korea Occupational Safety and Health Agency(KOSHA). And then refines this data, uses final 12 variables and 35447 data. To find satisfaction factor in working environment, this page has grouped employee to 3 types (under 30 age, 30 ~ 49age, over 50 age) and analyzed factor. Using CART algorithm, finds the best grouping variables in 155 data. It appeared that 'comfortable in organization' and 'proper reward' is the best grouping factor.

A Date Mining Approach to Intelligent College Road Map Advice Service (데이터 마이닝을 이용한 지능형 전공지도시스템 연구)

  • Choe, Deok-Won;Jo, Gyeong-Pil;Sin, Jin-Gyu
    • Proceedings of the Korea Inteligent Information System Society Conference
    • /
    • 2005.05a
    • /
    • pp.266-273
    • /
    • 2005
  • Data mining techniques enable us to generate useful information for decision support from the data sources which are generated and accumulated in the process of routine organizational management activities. College administration system is a typical example that produces a warehouse of student records as each and every student enters a college and undertakes the curricular and extracurricular activities. So far, these data have been utilized to a very limited student service purposes, such as issuance of transcripts, graduation evaluation, GPA calculation, etc. In this paper, we utilize Holland career search test results, TOEIC score, course work list, and GPA score as the input for data mining and generation the student advisory information. Factor analysis, AHP(Analytic Hierarchy Process), artificial neural net, and CART(Classification And Regression Tree) techniques are deployed in the data mining process. Since these data mining techniques are very powerful in processing and discovering useful knowledge and information from large scale student databases, we can expect a highly sophisticated student advisory knowledge and services which may not be obtained with the human student advice experts.

  • PDF

An Empirical Comparison of Bagging, Boosting and Support Vector Machine Classifiers in Data Mining (데이터 마이닝에서 배깅, 부스팅, SVM 분류 알고리즘 비교 분석)

  • Lee Yung-Seop;Oh Hyun-Joung;Kim Mee-Kyung
    • The Korean Journal of Applied Statistics
    • /
    • v.18 no.2
    • /
    • pp.343-354
    • /
    • 2005
  • The goal of this paper is to compare classification performances and to find a better classifier based on the characteristics of data. The compared methods are CART with two ensemble algorithms, bagging or boosting and SVM. In the empirical study of twenty-eight data sets, we found that SVM has smaller error rate than the other methods in most of data sets. When comparing bagging, boosting and SVM based on the characteristics of data, SVM algorithm is suitable to the data with small numbers of observation and no missing values. On the other hand, boosting algorithm is suitable to the data with number of observation and bagging algorithm is suitable to the data with missing values.

Pre-Adjustment of Incomplete Group Variable via K-Means Clustering

  • Hwang, S.Y.;Hahn, H.E.
    • Journal of the Korean Data and Information Science Society
    • /
    • v.15 no.3
    • /
    • pp.555-563
    • /
    • 2004
  • In classification and discrimination, we often face with incomplete group variable arising typically from many missing values and/or incredible cases. This paper suggests the use of K-means clustering for pre-adjusting incompleteness and in turn classification based on generalized statistical distance is performed. For illustrating the proposed procedure, simulation study is conducted comparatively with CART in data mining and traditional techniques which are ignoring incompleteness of group variable. Simulation study manifests that our methodology out-performs.

  • PDF

A Study on Development of A Web-Based Forecasting System of Industrial Accidents (웹 기반의 산업재해 예측시스템 개발에 관한 연구)

  • Leem, Young-Moon;Hwang, Young-Seob;Choi, Yo-Han
    • Proceedings of the Safety Management and Science Conference
    • /
    • 2007.11a
    • /
    • pp.269-274
    • /
    • 2007
  • Ultimate goal of this research is to develop a web-based forecasting system of industrial accidents. As an initial step for the purpose of this study, this paper provides a comparative analysis of 4 kinds of algorithms including CHAID, CART, C4.5, and QUEST. In addition, this paper presents the logical process for development of a forecasting system. Decision tree algorithm is utilized to predict results using objective and quantified data as a typical technique of data mining. The sample for this work was chosen from 10,536 data related to manufacturing industries during three years(2002$^{\sim}$2004) in korea.

  • PDF