• Title/Summary/Keyword: decision tree technique

Search Result 207, Processing Time 0.029 seconds

A study on analysis of factors on in-hospital mortality for community-acquired pneumonia (지역사회획득 폐렴 환자의 퇴원시 사망 요인 분석)

  • Kim, Yoo-Mi
    • Journal of the Korean Data and Information Science Society
    • /
    • v.22 no.3
    • /
    • pp.389-400
    • /
    • 2011
  • This study was carried out to analysis factors related to in-hospital mortality of community-acquired peumonia using administrative database. The subjects were 5,353 community-acquired pneumonia inpatients of the Korean National Hospital Discharge Injury Survey 2004-2006 data. The data were analyzed using chi-squared test and decision tree model in the data mining technique. Among the decision tree model, C4.5 had the best performance. The critical factors on in-hospital mortality of communityacquired pneumonia are admission route, respiratory failure, congenital heart failure including age, comorbidity, and bed size. This study was carried out using the administrative database including patients' characteristics and comorbidity. However further study should be extensively including hospital characteristics, regional medical resources, and patient management practice behavior.

Predictive Analysis of Problematic Smartphone Use by Machine Learning Technique

  • Kim, Yu Jeong;Lee, Dong Su
    • Journal of the Korea Society of Computer and Information
    • /
    • v.25 no.2
    • /
    • pp.213-219
    • /
    • 2020
  • In this paper, we propose a classification analysis method for diagnosing and predicting problematic smartphone use in order to provide policy data on problematic smartphone use, which is getting worse year after year. Attempts have been made to identify key variables that affect the study. For this purpose, the classification rates of Decision Tree, Random Forest, and Support Vector Machine among machine learning analysis methods, which are artificial intelligence methods, were compared. The data were from 25,465 people who responded to the '2018 Problematic Smartphone Use Survey' provided by the Korea Information Society Agency and analyzed using the R statistical package (ver. 3.6.2). As a result, the three classification techniques showed similar classification rates, and there was no problem of overfitting the model. The classification rate of the Support Vector Machine was the highest among the three classification methods, followed by Decision Tree and Random Forest. The top three variables affecting the classification rate among smartphone use types were Life Service type, Information Seeking type, and Leisure Activity Seeking type.

Predictive Analytics Model for Death Accidents in Building Projects by Trade - Based on Decision Tree- (PA기법을 이용한 건축공사 공종별 사망사고 예측모델 개발에 관한 연구 - 의사결정나무를 중심으로 -)

  • Choi, Jeong Won;Kim, Han Soo
    • Korean Journal of Construction Engineering and Management
    • /
    • v.22 no.5
    • /
    • pp.55-65
    • /
    • 2021
  • Compared with other industries, construction industry shows a higher rate of death accidents and recently companies' legal responsibilities are to be increasingly enforced. The trend causes tremendous concerns for construction firms and increases the importance of forecasting and pro-actively managing death accidents in construction fields. The objective of the study is to develop a predictive analytics model for forecasting death accidents in building projects based on a decision tree technique, which enables to forecast the probabilities of death accidents by trade. The use of the model helps to decrease risks of legal punishments and to assist the safe execution of building projects by forecasting and pro-actively managing death accidents.

A data extension technique to handle incomplete data (불완전한 데이터를 처리하기 위한 데이터 확장기법)

  • Lee, Jong Chan
    • Journal of the Korea Convergence Society
    • /
    • v.12 no.2
    • /
    • pp.7-13
    • /
    • 2021
  • This paper introduces an algorithm that compensates for missing values after converting them into a format that can represent the probability for incomplete data including missing values in training data. In the previous method using this data conversion, incomplete data was processed by allocating missing values with an equal probability that missing variables can have. This method applied to many problems and obtained good results, but it was pointed out that there is a loss of information in that all information remaining in the missing variable is ignored and a new value is assigned. On the other hand, in the new proposed method, only complete information not including missing values is input into the well-known classification algorithm (C4.5), and the decision tree is constructed during learning. Then, the probability of the missing value is obtained from this decision tree and assigned as an estimated value of the missing variable. That is, some lost information is recovered using a lot of information that has not been lost from incomplete learning data.

Classification Model of Food Groups in Food Exchange Table Using Decision Tree-based Machine Learning

  • Kim, Ji Yun;Kim, Jongwan
    • Journal of the Korea Society of Computer and Information
    • /
    • v.27 no.12
    • /
    • pp.51-58
    • /
    • 2022
  • In this paper, we propose a decision tree-based machine learning model that leads to food exchange table renewal by classifying food groups through machine learning for existing food and food data found by web crawling. The food exchange table is the standard for food exchange intake when composing a diet such as diet and diet, as well as patients who need nutritional management. The food exchange table, which is the standard for the composition of the diet, takes a lot of manpower and time in the process of revision through the National Health and Nutrition Survey, making it difficult to quickly reflect food changes according to new foods or trends. Since the proposed technique classifies newly added foods based on the existing food group, it is possible to organize a rapid food exchange table reflecting the trend of food. As a result of classifying food into the proposed model in the study, the accuracy of the food group in the food exchange table was 97.45%, so this food classification model is expected to be highly utilized for the composition of a diet that suits your taste in hospitals and nursing homes.

Smart Farm Expert System for Paprika using Decision Tree Technique (의사결정트리 기법을 이용한 파프리카용 스마트팜 전문가 시스템)

  • Jeong, Hye-sun;Lee, In-yong;Lim, Joong-seon
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2018.10a
    • /
    • pp.373-376
    • /
    • 2018
  • Traditional paprika smart farm systems are often harmful to paprika growth because they are set to follow the values of several sensors to the reference value, so the system is often unable to make optimal judgement. Using decision tree techniques, the expert system for the paprika smart farm is designed to create a control system with a decision-making structure similar to that of farmers using data generated by factors that depend on their surroundings. With the current smart farm control system, it is essential for farmers to intervene in the surrounding environment because it is designed to follow sensor values to the reference values set by the farmer. To solve this problem even slightly, it is going to obtain environmental data and design controllers that apply decision tree method. The expert system is established for complex control by selecting the most influential environmental factors before controlling the paprika smart farm equipment, including criteria for selecting decisions by farmers. The study predicts that each environmental element will be a standard when creating smart farms for professionals because of the interrelationships of data, and more surrounding environmental factors affecting growth.

  • PDF

Finding a plan to improve recognition rate using classification analysis

  • Kim, SeungJae;Kim, SungHwan
    • International journal of advanced smart convergence
    • /
    • v.9 no.4
    • /
    • pp.184-191
    • /
    • 2020
  • With the emergence of the 4th Industrial Revolution, core technologies that will lead the 4th Industrial Revolution such as AI (artificial intelligence), big data, and Internet of Things (IOT) are also at the center of the topic of the general public. In particular, there is a growing trend of attempts to present future visions by discovering new models by using them for big data analysis based on data collected in a specific field, and inferring and predicting new values with the models. In order to obtain the reliability and sophistication of statistics as a result of big data analysis, it is necessary to analyze the meaning of each variable, the correlation between the variables, and multicollinearity. If the data is classified differently from the hypothesis test from the beginning, even if the analysis is performed well, unreliable results will be obtained. In other words, prior to big data analysis, it is necessary to ensure that data is well classified according to the purpose of analysis. Therefore, in this study, data is classified using a decision tree technique and a random forest technique among classification analysis, which is a machine learning technique that implements AI technology. And by evaluating the degree of classification of the data, we try to find a way to improve the classification and analysis rate of the data.

Classification of Land Cover over the Korean Peninsula Using Polar Orbiting Meteorological Satellite Data (극궤도 기상위성 자료를 이용한 한반도의 지면피복 분류)

  • Suh, Myoung-Seok;Kwak, Chong-Heum;Kim, Hee-Soo;Kim, Maeng-Ki
    • Journal of the Korean earth science society
    • /
    • v.22 no.2
    • /
    • pp.138-146
    • /
    • 2001
  • The land cover over Korean peninsula was classified using a multi-temporal NOAA/AVHRR (Advanced Very High Resolution Radiometer) data. Four types of phenological data derived from the 10-day composited NDVI (Normalized Differences Vegetation Index), maximum and annual mean land surface temperature, and topographical data were used not only reducing the data volume but also increasing the accuracy of classification. Self organizing feature map (SOFM), a kind of neural network technique, was used for the clustering of satellite data. We used a decision tree for the classification of the clusters. When we compared the classification results with the time series of NDVI and some other available ground truth data, the urban, agricultural area, deciduous tree and evergreen tree were clearly classified.

  • PDF

A Study of Analyzing Realtime Strategy Game Data using Data Mining (Data Mining을 이용한 전략시뮬레이션 게임 데이터 분석)

  • Yong, Hye-Ryeon;Kim, Do-Jin;Hwang, Hyun-Seok
    • Journal of Korea Game Society
    • /
    • v.15 no.4
    • /
    • pp.59-68
    • /
    • 2015
  • The progress in Information & Communication Technology enables data scientists to analyze big data for identifying peoples' daily lives and tacit preferences. A variety of industries already aware the potential usefulness of analyzing big data. However limited use of big data has been performed in game industry. In this research, we adopt data mining technique to analyze data gathered from a strategic simulation game. Decision Tree, Random Forest, Multi-class SVM, and Linear Regression techniques are used to find the most important variables to users' game levels. We provide practical guides for game design and usability based on the analyzed results.

Discovering Relationships between Skin Type and Life Style Using Data Mining Techniques: A Case Study of Korea

  • Kim, Taeheung;Ha, Jihyun;Lee, Jong-Seok;Oh, Younhak;Cho, Yong Ju
    • Industrial Engineering and Management Systems
    • /
    • v.15 no.1
    • /
    • pp.110-121
    • /
    • 2016
  • With the growing interest in skincare and maintenance, there are increasing numbers of studies on the classification of skin type and the factors influencing each type. This study presents a novel methodology by using data mining, for the determination of the relationships between skin type, lifestyle, and patterns of cosmetic utilization. Eight skin-specific factors, which are moisture, sebum in U-zone (both cheeks), sebum in T-zone (forehead, nose, and chin), pore, melanin, wrinkle, acne, hemoglobin, were measured in 1,246 subjects living in South Korea, in conjunction with a questionnaire survey analyzing their lifestyles and pattern of cosmetic utilization. Using various multivariate statistical methods and data mining techniques, we classified the skin types based on the skin-specific values, determined the relationship between skin type and lifestyle, and accordingly sorted the subjects into clusters. Logistic regression analysis revealed gender-related differences in the skin; therefore, separate analyses were performed for males and females. Using the Gaussian Mixture Modeling (GMM) technique, we classified the subjects based on skin type (two male and four female). Using the ANOVA and decision tree techniques, we attempted to characterize the relationship between each skin type and the lifestyles of the subjects. Menstruation, eating habits, stress, and smoking were identified as the major factors affecting the skin.