• Title/Summary/Keyword: 의사결정나무 분석

Search Result 409, Processing Time 0.026 seconds

Selection of the Optimal Decision Tree Model Using Grid Search Method : Focusing on the Analysis of the Factors Affecting Job Satisfaction of Workplace Reserve Force Commanders (격자탐색법을 이용한 의사결정나무 분석 최적 모형 선택 : 직장예비군 지휘관의 직장만족도에 대한 영향 요인 분석을 중심으로)

  • Jeong, Chulwoo;Jeong, Won Young;Shin, David
    • Journal of the Korean Operations Research and Management Science Society
    • /
    • v.40 no.2
    • /
    • pp.19-29
    • /
    • 2015
  • The purpose of this study is to suggest the grid search method for selecting an optimal decision tree model. It chooses optimal values for the maximum depth of tree and the minimum number of observations that must exist in a node in order for a split to be attempted. Therefore, the grid search method guarantees building a decision tree model that shows more precise and stable classifying performance. Through empirical analysis using data of job satisfaction of workplace reserve force commanders, we show that the grid search method helps us generate an optimal decision tree model that gives us hints for the improvement direction of labor conditions of Korean workplace reserve force commanders.

Analysis of Factors for Seasonal Meat Color Characteristics in Hanwoo(Korean Cattle) Beef using Decision Tree Method (의사결정나무분석기법을 이용한 계절별 한우육의 육색 특성에 미치는 요인분석)

  • Kim, Seok-Jung;Kim, Yong-Sun;Song, Young-Han;Lee, Sung-Ki
    • Journal of Animal Science and Technology
    • /
    • v.44 no.5
    • /
    • pp.607-616
    • /
    • 2002
  • This study analyzed the effects of pH, sex, backfat thickness, ribeye area, cold carcass weight, shipping month, muscle internal temperature, average daily temperature, and average relative humidity for slaughtered Hanwoo to meat color by season. The analyses focused on interaction and each effect to meat color of the factors. For the result for analysis of multiple linear regressions, meat color values were decreased as pH increased in all meat color, and the meat color values increased as the backfat thickness was increased. As the results of the decision tree analysis by each factor, cow and steer slaughtered in spring and autumn were the highest in the lightness(L*). The redness(a*) was the cases that pH was less than 5.63 and average relative humidity was over than 71.5% for Hanwoo slaughtered in autumn. The chroma(C*) value was the highest for Hanwoo that was slaughtered in summer and autumn, the pH was less than 5.60, and the back fat thickness was over than 8 mm. The hue angle($h^0$) was shown that the muscle internal temperature was less than 4.7$^{\circ}C$ among Hanwoo which was slaughtered in spring, summer, and autumn, the pH was less than 5.66, and the back fat thickness was over than 8 mm.

The Factors of Participating in a Smoking Cessation Program using Integrated Method of Decision Tree and Neural Network Algorithm (인공신경망 분석과 결정트리 융합에 의한 금연 프로그램 참여 결정 요인)

  • Byeon, Haewon
    • Journal of the Korea Convergence Society
    • /
    • v.6 no.2
    • /
    • pp.25-30
    • /
    • 2015
  • The purpose of this study was to analyze the factors that affects the participating in a smoking cessation program. Data were from the A Study on the Seoul Welfare Panel Study 2010. Subjects were 1,326 smokers aged 19 and older living in the community. Dependent variable was defined as experience of smoking cessation. Explanatory variables were included as age, gender, level of education, employment status, household income, marital status, drinking, self-reported health status, depression, disease, and physical activity. A prediction model was developed by the use of a Decision Tree and Neural Network Algorithm. In the Prediction model, self reported health status, disease, income, household income were significantly associated with participating in a smoking cessation program. Based this study, systematic education and development of programs are required.

Prediction of commitment and persistence in heterosexual involvements according to the styles of loving using a datamining technique (데이터마이닝을 활용한 사랑의 형태에 따른 연인관계 몰입수준 및 관계 지속여부 예측)

  • Park, Yoon-Joo
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.4
    • /
    • pp.69-85
    • /
    • 2016
  • Successful relationship with loving partners is one of the most important factors in life. In psychology, there have been some previous researches studying the factors influencing romantic relationships. However, most of these researches were performed based on statistical analysis; thus they have limitations in analyzing complex non-linear relationships or rules based reasoning. This research analyzes commitment and persistence in heterosexual involvement according to styles of loving using a datamining technique as well as statistical methods. In this research, we consider six different styles of loving - 'eros', 'ludus', 'stroge', 'pragma', 'mania' and 'agape' which influence romantic relationships between lovers, besides the factors suggested by the previous researches. These six types of love are defined by Lee (1977) as follows: 'eros' is romantic, passionate love; 'ludus' is a game-playing or uncommitted love; 'storge' is a slow developing, friendship-based love; 'pragma' is a pragmatic, practical, mutually beneficial relationship; 'mania' is an obsessive or possessive love and, lastly, 'agape' is a gentle, caring, giving type of love, brotherly love, not concerned with the self. In order to do this research, data from 105 heterosexual couples were collected. Using the data, a linear regression method was first performed to find out the important factors associated with a commitment to partners. The result shows that 'satisfaction', 'eros' and 'agape' are significant factors associated with the commitment level for both male and female. Interestingly, in male cases, 'agape' has a greater effect on commitment than 'eros'. On the other hand, in female cases, 'eros' is a more significant factor than 'agape' to commitment. In addition to that, 'investment' of the male is also crucial factor for male commitment. Next, decision tree analysis was performed to find out the characteristics of high commitment couples and low commitment couples. In order to build decision tree models in this experiment, 'decision tree' operator in the datamining tool, Rapid Miner was used. The experimental result shows that males having a high satisfaction level in relationship show a high commitment level. However, even though a male may not have a high satisfaction level, if he has made a lot of financial or mental investment in relationship, and his partner shows him a certain amount of 'agape', then he also shows a high commitment level to the female. In the case of female, a women having a high 'eros' and 'satisfaction' level shows a high commitment level. Otherwise, even though a female may not have a high satisfaction level, if her partner shows a certain amount of 'mania' then the female also shows a high commitment level. Finally, this research built a prediction model to establish whether the relationship will persist or break up using a decision tree. The result shows that the most important factor influencing to the break up is a 'narcissistic tendency' of the male. In addition to that, 'satisfaction', 'investment' and 'mania' of both male and female also affect a break up. Interestingly, while the 'mania' level of a male works positively to maintain the relationship, that of a female has a negative influence. The contribution of this research is adopting a new technique of analysis using a datamining method for psychology. In addition, the results of this research can provide useful advice to couples for building a harmonious relationship with each other. This research has several limitations. First, the experimental data was sampled based on oversampling technique to balance the size of each classes. Thus, it has a limitation of evaluating performances of the predictive models objectively. Second, the result data, whether the relationship persists of not, was collected relatively in short periods - 6 months after the initial data collection. Lastly, most of the respondents of the survey is in their 20's. In order to get more general results, we would like to extend this research to general populations.

Severity-Adjusted LOS Model of AMI patients based on the Korean National Hospital Discharge in-depth Injury Survey Data (퇴원손상심층조사 자료를 기반으로 한 급성심근경색환자 재원일수의 중증도 보정 모형 개발)

  • Kim, Won-Joong;Kim, Sung-Soo;Kim, Eun-Ju;Kang, Sung-Hong
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.14 no.10
    • /
    • pp.4910-4918
    • /
    • 2013
  • This study aims to design a Severity-Adjusted LOS(Length of Stay) Model in order to efficiently manage LOS of AMI(Acute Myocardial Infarction) patients. We designed a Severity-Adjusted LOS Model with using data-mining methods(multiple regression analysis, decision trees, and neural network) which covered 6,074 AMI patients who showed the diagnosis of I21 from 2004-2009 Korean National Hospital Discharge in-depth Injury Survey. A decision tree model was chosen for the final model that produced superior results. This study discovered that the execution of CABG, status at discharge(alive or dead), comorbidity index, etc. were major factors affecting a Sevirity-Adjustment of LOS of AMI patients. The difference between real LOS and adjusted LOS resulted from hospital location and bed size. The efficient management of LOS of AMI patients requires that we need to perform various activities after identifying differentiating factors. These factors can be specified by applying each hospital's data into this newly designed Severity-Adjusted LOS Model.

Developing data quality management algorithm for Hypertension Patients accompanied with Diabetes Mellitus By Data Mining (데이터 마이닝을 이용한 고혈압환자의 당뇨질환 동반에 관한 데이터 질 관리 알고리즘 개발)

  • Hwang, Kyu-Yeon;Lee, Eun-Sook;Kim, Go-Won;Hong, Sung-Ok;Park, Jong-Son;Kwak, Mi-Sook;Lee, Ye-Jin;Im, Chae-Hyuk;Park, Tae-Hyun;Park, Jong-Ho;Kang, Sung-Hong
    • Journal of Digital Convergence
    • /
    • v.14 no.7
    • /
    • pp.309-319
    • /
    • 2016
  • There is a need to develop a data quality management algorithm in order to improve the quality of health care data. In this study, we developed a data quality control algorithms associated diseases related to diabetes in patients with hypertension. To make a data quality algorithm, we extracted hypertension patients from 2011 and 2012 discharge damage survey data. As the result of developing Data quality management algorithm, significant factors in hypertension patients with diabetes are gender, age, Glomerular disorders in diabetes mellitus, Diabetic retinopathy, Diabetic polyneuropathy, Closed [percutaneous] [needle] biopsy of kidney. Depending on the decision tree results, we defined Outlier which was probability values associated with a patient having diabetes corporal with hypertension or more than 80%, or not more than 20%, and found six groups with extreme values for diabetes accompanying hypertension patients. Thus there is a need to check the actual data contained in the Outlier(extreme value) groups to improve the quality of the data.

The Prediction of Survival of Breast Cancer Patients Based on Machine Learning Using Health Insurance Claim Data (건강보험 청구 데이터를 활용한 머신러닝 기반유방암 환자의 생존 여부 예측)

  • Doeggyu Lee;Kyungkeun Byun;Hyungdong Lee;Sunhee Shin
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.28 no.2
    • /
    • pp.1-9
    • /
    • 2023
  • Research using AI and big data is also being actively conducted in the health and medical fields such as disease diagnosis and treatment. Most of the existing research data used cohort data from research institutes or some patient data. In this paper, the difference in the prediction rate of survival and the factors affecting survival between breast cancer patients in their 40~50s and other age groups was revealed using health insurance review claim data held by the HIRA. As a result, the accuracy of predicting patients' survival was 0.93 on average in their 40~50s, higher than 0.86 in their 60~80s. In terms of that factor, the number of treatments was high for those in their 40~50s, and age was high for those in their 60~80s. Performance comparison with previous studies, the average precision was 0.90, which was higher than 0.81 of the existing paper. As a result of performance comparison by applied algorithm, the overall average precision of Decision Tree, Random Forest, and Gradient Boosting was 0.90, and the recall was 1.0, and the precision of multi-layer perceptrons was 0.89, and the recall was 1.0. I hope that more research will be conducted using machine learning automation(Auto ML) tools for non-professionals to enhance the use of the value for health insurance review claim data held by the HIRA.

Exploring the Feature Selection Method for Effective Opinion Mining: Emphasis on Particle Swarm Optimization Algorithms

  • Eo, Kyun Sun;Lee, Kun Chang
    • Journal of the Korea Society of Computer and Information
    • /
    • v.25 no.11
    • /
    • pp.41-50
    • /
    • 2020
  • Sentimental analysis begins with the search for words that determine the sentimentality inherent in data. Managers can understand market sentimentality by analyzing a number of relevant sentiment words which consumers usually tend to use. In this study, we propose exploring performance of feature selection methods embedded with Particle Swarm Optimization Multi Objectives Evolutionary Algorithms. The performance of the feature selection methods was benchmarked with machine learning classifiers such as Decision Tree, Naive Bayesian Network, Support Vector Machine, Random Forest, Bagging, Random Subspace, and Rotation Forest. Our empirical results of opinion mining revealed that the number of features was significantly reduced and the performance was not hurt. In specific, the Support Vector Machine showed the highest accuracy. Random subspace produced the best AUC results.

Data-driven Co-Design Process for New Product Development: A Case Study on Smart Heating Jacket (신제품 개발을 위한 데이터 기반 공동 디자인 프로세스: 스마트 난방복 사례 연구)

  • Leem, Sooyeon;Lee, Sang Won
    • Journal of the Korea Convergence Society
    • /
    • v.12 no.1
    • /
    • pp.133-141
    • /
    • 2021
  • This research suggests a design process that effectively complements the human-centered design through an objective data-driven approach. The subjective human-centered design process can often lack objectivity and can be supplemented by the data-driven approaches to effectively discover hidden user needs. This research combines the data mining analysis with co-design process and verifies its applicability through the case study on the smart heating jacket. In the data mining process, the clustering can group the users which is the basis for selecting the target groups and the decision tree analysis primarily identifies the important user perception attributes and values. The broad point of view based on the data analysis is modified through the co-design process which is the deeper human-centered design process by using the developed workbook. In the co-design process, the journey maps, needs and pain points, ideas, values for the target user groups are identified and finalized. They can become the basis for starting new product development.

Spatial Distribution of Major Soil Types in Korea and an Assessment of Soil Predictability Using Soil Forming Factors (한국 주요 토양유형의 공간적 분포와 토양형성요인을 이용한 예측가능성 평가)

  • Park, Soo-Jin;Sonn, Yeon-Kyu;Hong, Suk-Young;Park, Chan-Won;Zhang, Yong-Seon
    • Journal of the Korean Geographical Society
    • /
    • v.45 no.1
    • /
    • pp.95-118
    • /
    • 2010
  • This study aims to investigate the spatial distribution of major soil types in Korea, and to assess the ability to predict soil distribution using environmental variables. A classification tree method was used to assess soil predictability. While the great soil groups can give more intuitive understandings on their spatial distributions, its predictability using environmental factors is much lower than that of the great groups. The most important factor to determine the spatial distribution of major soil types is the geomorphological characteristic of Korea that shows distinctive morphological difference between mountains and plains. Spatial distribution of climatic variables and catenary soil sequence along slopes play additional roles in determining the distribution of soil types. The classification tree models resulted in 35-75% of prediction accuracy, depends on the combination of different environmental variables brought in the models. While geomorphological variables are the best predictors for the great groups, climatic variables perform better for the great soil groups.