• Title/Summary/Keyword: data mining(CART)

Search Result 68, Processing Time 0.022 seconds

An Exploratory Study of Fatigue Related Factors among School Personnelin Seoul by Data mining (데이터 마이닝을 이용한 서울시교직원의 피로요인 탐색연구)

  • Lee, Hui-U;Sin, Seon-Mi
    • Journal of the Korean Society of School Health
    • /
    • v.19 no.1
    • /
    • pp.79-88
    • /
    • 2006
  • Purpose : To identify general characteristics of school personnel with recent fatigue which was the most frequent symptom among subjective symptoms and to explore fatigue-related factors by evaluating physical and perceived health status, life style, and symptoms through data mining techniques. Methods : We collected a data of the 1,147(male 545, female 602) who were elementary, middle, or high school personnel, answered a questionnaire, and received physical examination in Seoul School Health Center from September to November in 2000. And we investigated the differences between fatigue group and non-fatigue group for demographic characteristics, physical health status, perceived health status, symptoms, and laboratory values by frequency, chi-square test, t-test, or simple logistic regression analysis by SAS package 8.1, and then selected significant variables as input variables of a decision tree analysis of CART model by SAS E-miner. Results : In general characteristics, the fatigue consisted of 41.1%(male 35.2%, female 46.4%) among 1,147 school personnel. In classical statistics, factors related with fatigue were female, lower means of systolic and diastolic pressure, young age, personnel in middle school, irregular eating habit, no exercise a week or less than 30minutes exercise a day, perception of unhealthy status, and subjective symptoms including short of breath at exercise. In simple logistic regression to examine the relationship between selected independent variables and fatigue as a dependent variable, the odds ratio of gender (female vs male) was 1.58 times, and young age ( 20s vs 60s) 20.67 times, and middle vs high school personnel 1.86 times. However, we mined combined several characteristics by SAS-E miner. In CART model, if health perception was healthy, and age was >= 37.5 years, the proportion of the fatigue was only 19.3%. but if health perception was not healthy and symptom was severe 'short of breath' during exercise and age was < 53.5 years, and BMI was >= 22.69, the proportion of the fatigue was up to 84.8%. Conclusions : The fatigue consisted of 41.1%(male 35.2%, female 46.4%). In classical statistics, fatigue-related factors among school personnel were young age, female gender, perceived unhealthy status, subjective physical symptoms, poor life-style, and lower blood pressure rather than only physical health status. However, in data mining, if health perception was healthy and age was >= 37.5 years, the proportion of the fatigue was only 19.3%. but if health perception was not healthy and symptom was severe 'short of breath' during exercise and age was < 53.5 years, and BMI was >= 22.69, the proportion of the fatigue was up to 84.8%.

A Study on the Combined Decision Tree(C4.5) and Neural Network Algorithm for Classification of Mobile Telecommunication Customer (이동통신고객 분류를 위한 의사결정나무(C4.5)와 신경망 결합 알고리즘에 관한 연구)

  • 이극노;이홍철
    • Journal of Intelligence and Information Systems
    • /
    • v.9 no.1
    • /
    • pp.139-155
    • /
    • 2003
  • This paper presents the new methodology of analyzing and classifying patterns of customers in mobile telecommunication market to enhance the performance of predicting the credit information based on the decision tree and neural network. With the application of variance selection process from decision tree, the systemic process of defining input vector's value and the rule generation were developed. In point of customer management, this research analyzes current customers and produces the patterns of them so that the company can maintain good customer relationship and makes special management on the customer who has huh potential of getting out of contract in advance. The real implementation of proposed method shows that the predicted accuracy is higher than existing methods such as decision tree(CART, C4.5), regression, neural network and combined model(CART and NN).

  • PDF

Customer Segmentation of a Home Study Company using a Hybrid Decision Tree and Artificial Neural Network Model (하이브리드 의사결정나무와 인공신경망 모델을 이용한 방문학습지사의 고객세분화)

  • Seo Kwang-Kyu;Ahn Beum-Jun
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.7 no.3
    • /
    • pp.518-523
    • /
    • 2006
  • Due to keen competition among companies, they have segmented customers and they are trying to offer specially targeted customer by means of the distinguished method. In accordance, data mining techniques are noted as the effective method that extracts useful information. This paper explores customer segmentation of the home study company using a hybrid decision tree and artificial neural network model. With the application of variance selection process from decision tree, the systemic process of defining input vector's value and the rule generation were developed. In point of customer management, this research analyzes current customers and produces the patterns of them so that the company can maintain good customer relationship. The case study shows that the predicted accuracy of the proposed model is higher than those of regression, decision tree (CART), artificial neural networks.

  • PDF

Predicting Model of Students Leaving Their Majors Using Data Mining Technique (데이터마이닝 기법을 이용한 전공이탈자 예측모형)

  • Leem, Young-Moon;Ryu, Chang-Hyun
    • Journal of the Korea Safety Management & Science
    • /
    • v.8 no.5
    • /
    • pp.17-25
    • /
    • 2006
  • Nowadays most colleges are confronting with a serious problem because many students have left their majors at the colleges. In order to make a countermeasure for reducing major separation rate, many universities are trying to find a proper solution. As a similar endeavor, the objective of this paper Is to find a predicting model of students leaving their majors. The sample for this study was chosen from a university in Kangwon-Do during seven years(2000.3.1 $\sim$ 2006. 6.30). In this study, the ratio of training sample versus testing sample among partition data was controlled as 50% : 50% for a validation test of data division. Also, this study provides values about accuracy, sensitivity, specificity about three kinds of algorithms including CHAID, CART and C4.5. In addition, ROC chart and gains chart were used for classification of students leaving their majors. The analysis results were very informative since those enable us to know the most important factors such as semester taking a course, grade on cultural subjects, scholarship, grade on majors, and total completion of courses which can affect students leaving their majors.

Tolerance Computation for Process Parameter Considering Loss Cost : In Case of the Larger is better Characteristics (손실 비용을 고려한 공정 파라미터 허용차 산출 : 망대 특성치의 경우)

  • Kim, Yong-Jun;Kim, Geun-Sik;Park, Hyung-Geun
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.40 no.2
    • /
    • pp.129-136
    • /
    • 2017
  • Among the information technology and automation that have rapidly developed in the manufacturing industries recently, tens of thousands of quality variables are estimated and categorized in database every day. The former existing statistical methods, or variable selection and interpretation by experts, place limits on proper judgment. Accordingly, various data mining methods, including decision tree analysis, have been developed in recent years. Cart and C5.0 are representative algorithms for decision tree analysis, but these algorithms have limits in defining the tolerance of continuous explanatory variables. Also, target variables are restricted by the information that indicates only the quality of the products like the rate of defective products. Therefore it is essential to develop an algorithm that improves upon Cart and C5.0 and allows access to new quality information such as loss cost. In this study, a new algorithm was developed not only to find the major variables which minimize the target variable, loss cost, but also to overcome the limits of Cart and C5.0. The new algorithm is one that defines tolerance of variables systematically by adopting 3 categories of the continuous explanatory variables. The characteristics of larger-the-better was presumed in the environment of programming R to compare the performance among the new algorithm and existing ones, and 10 simulations were performed with 1,000 data sets for each variable. The performance of the new algorithm was verified through a mean test of loss cost. As a result of the verification show, the new algorithm found that the tolerance of continuous explanatory variables lowered loss cost more than existing ones in the larger is better characteristics. In a conclusion, the new algorithm could be used to find the tolerance of continuous explanatory variables to minimize the loss in the process taking into account the loss cost of the products.

A study on the Analysis and Forecast of Effect Factors in e-Learning Reuse Intention Using Rule Induction Techniques (규칙유도기법을 이용한 이러닝 시스템의 재이용의도 영향요인 분석 및 예측에 관한 연구)

  • Bae, Jae-Kwon;Kim, Jin-Hwa;Jeong, Hwa-Min
    • Journal of Information Technology Applications and Management
    • /
    • v.17 no.2
    • /
    • pp.71-90
    • /
    • 2010
  • Electronic learning(or e-learning) has created hype for companies, universities, and other educational institutions. It has led to the phenomenal growth in the use of web-based learning and experimentation with multimedia, video conferencing, and internet-based technologies. Many researchers are interested in the factors that affect to the performance of e-learning or e-learning services. In this sense, this study is aimed at proposing e-learning system reuse prediction models in which e-learner intention to reuse influence factors(i.e., system accessibility, system stability, information clarity, information validity, self-regulated efficacy, computer self-efficacy, perceived usefulness, perceived ease of use, flow, and parental expectation) affect e-learner intention to reuse positively. A web survey was conducted for the full members of the e-learning education institute A in Seoul, Republic of Korea, an exclusive e-learning company that provides real time video lectures via the desktop conferencing system. The web survey was conducted for 20 days from November 5, 2009, through the e-learning web site of the company A. In this study, three data mining techniques were used : the multivariate discriminant analysis, CART, and C5.0 algorithm. This study was conducted to provide the e-learning service providers, e-learning operators, and contents developers with marketing and management strategies for improving the e-learning service companies, based on the data mining analysis results.

  • PDF

Churn Analysis for the First Successful Candidates in the Entrance Examination for K University

  • Kim, Kyu-Il;Kim, Seung-Han;Kim, Eun-Young;Kim, Hyun;Yang, Jae-Wan;Cho, Jang-Sik
    • Journal of the Korean Data and Information Science Society
    • /
    • v.18 no.1
    • /
    • pp.1-10
    • /
    • 2007
  • In this paper, we focus on churn analysis for the first successful candidates in the entrance examination on 2006 year using Clementine, data mining tool. The goal of this study is to apply decision tree including C5.0 and CART algorithms, neural network and logistic regression techniques to predict a successful candidate churn. And we analyze the churning and nochurning successful candidates and why the successful candidates churn and which successful candidates are most likely to churn in the future using data from entrance examination data of K university on 2006 year.

  • PDF

Utilizing the Effect of Market Basket Size for Improving the Practicality of Association Rule Measures (연관규칙 흥미성 척도의 실용성 향상을 위한 장바구니 크기 효과 반영 방안)

  • Kim, Won-Seo;Jeong, Seung-Ryul;Kim, Nam-Gyu
    • The KIPS Transactions:PartD
    • /
    • v.17D no.1
    • /
    • pp.1-8
    • /
    • 2010
  • Association rule mining techniques enable us to acquire knowledge concerning sales patterns among individual items from voluminous transactional data. Certainly, one of the major purposes of association rule mining is utilizing the acquired knowledge to provide marketing strategies such as catalogue design, cross-selling and shop allocation. However, this requires too much time and high cost to only extract the actionable and profitable knowledge from tremendous numbers of discovered patterns. In currently available literature, a number of interest measures have been devised to accelerate and systematize the process of pattern evaluation. Unfortunately, most of such measures, including support and confidence, are prone to yielding impractical results because they are calculated only from the sales frequencies of items. For instance, traditional measures cannot differentiate between the purchases in a small basket and those in a large shopping cart. Therefore, some adjustment should be made to the size of market baskets because there is a strong possibility that mutually irrelevant items could appear together in a large shopping cart. Contrary to the previous approaches, we attempted to consider market basket's size in calculating interest measures. Because the devised measure assigns different weights to individual purchases according to their basket sizes, we expect that the measure can minimize distortion of results caused by accidental patterns. Additionally, we performed intensive computer simulations under various environments, and we performed real case analyses to analyze the correctness and consistency of the devised measure.

Industrial Safety Risk Analysis Using Spatial Analytics and Data Mining (공간분석·데이터마이닝 융합방법론을 통한 산업안전 취약지 등급화 방안)

  • Ko, Kyeongseok;Yang, Jaekyung
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.40 no.4
    • /
    • pp.147-153
    • /
    • 2017
  • The mortality rate in industrial accidents in South Korea was 11 per 100,000 workers in 2015. It's five times higher than the OECD average. Economic losses due to industrial accidents continue to grow, reaching 19 trillion won much more than natural disaster losses equivalent to 1.1 trillion won. It requires fundamental changes according to industrial safety management. In this study, We classified the risk of accidents in industrial complex of Ulju-gun using spatial analytics and data mining. We collected 119 data on accident data, factory characteristics data, company information such as sales amount, capital stock, building information, weather information, official land price, etc. Through the pre-processing and data convergence process, the analysis dataset was constructed. Then we conducted geographically weighted regression with spatial factors affecting fire incidents and calculated the risk of fire accidents with analytical model for combining Boosting and CART (Classification and Regression Tree). We drew the main factors that affect the fire accident. The drawn main factors are deterioration of buildings, capital stock, employee number, officially assessed land price and height of building. Finally the predicted accident rates were divided into four class (risk category-alert, hazard, caution, and attention) with Jenks Natural Breaks Classification. It is divided by seeking to minimize each class's average deviation from the class mean, while maximizing each class's deviation from the means of the other groups. As the analysis results were also visualized on maps, the danger zone can be intuitively checked. It is judged to be available in different policy decisions for different types, such as those used by different types of risk ratings.

Video Ranking Model: a Data-Mining Solution with the Understood User Engagement

  • Chen, Yongyu;Chen, Jianxin;Zhou, Liang;Yan, Ying;Huang, Ruochen;Zhang, Wei
    • Journal of Multimedia Information System
    • /
    • v.1 no.1
    • /
    • pp.67-75
    • /
    • 2014
  • Nowadays as video services grow rapidly, it is important for the service providers to provide customized services. Video ranking plays a key role for the service providers to attract the subscribers. In this paper we propose a weekly video ranking mechanism based on the quantified user engagement. The traditional QoE ranking mechanism is relatively subjective and usually is accomplished by grading, while QoS is relatively objective and is accomplished by analyzing the quality metrics. The goal of this paper is to establish a ranking mechanism which combines the both advantages of QoS and QoE according to the third-party data collection platform. We use data mining method to classify and analyze the collected data. In order to apply into the actual situation, we first group the videos and then use the regression tree and the decision tree (CART) to narrow down the number of them to a reasonable scale. After that we introduce the analytic hierarchy process (AHP) model and use Elo rating system to improve the fairness of our system. Questionnaire results verify that the proposed solution not only simplifies the computation but also increases the credibility of the system.

  • PDF