• Title/Summary/Keyword: Decision Tree analysis

Search Result 725, Processing Time 0.028 seconds

Discovering Relationships between Skin Type and Life Style Using Data Mining Techniques: A Case Study of Korea

  • Kim, Taeheung;Ha, Jihyun;Lee, Jong-Seok;Oh, Younhak;Cho, Yong Ju
    • Industrial Engineering and Management Systems
    • /
    • v.15 no.1
    • /
    • pp.110-121
    • /
    • 2016
  • With the growing interest in skincare and maintenance, there are increasing numbers of studies on the classification of skin type and the factors influencing each type. This study presents a novel methodology by using data mining, for the determination of the relationships between skin type, lifestyle, and patterns of cosmetic utilization. Eight skin-specific factors, which are moisture, sebum in U-zone (both cheeks), sebum in T-zone (forehead, nose, and chin), pore, melanin, wrinkle, acne, hemoglobin, were measured in 1,246 subjects living in South Korea, in conjunction with a questionnaire survey analyzing their lifestyles and pattern of cosmetic utilization. Using various multivariate statistical methods and data mining techniques, we classified the skin types based on the skin-specific values, determined the relationship between skin type and lifestyle, and accordingly sorted the subjects into clusters. Logistic regression analysis revealed gender-related differences in the skin; therefore, separate analyses were performed for males and females. Using the Gaussian Mixture Modeling (GMM) technique, we classified the subjects based on skin type (two male and four female). Using the ANOVA and decision tree techniques, we attempted to characterize the relationship between each skin type and the lifestyles of the subjects. Menstruation, eating habits, stress, and smoking were identified as the major factors affecting the skin.

Selecting the Best Prediction Model for Readmission

  • Lee, Eun-Whan
    • Journal of Preventive Medicine and Public Health
    • /
    • v.45 no.4
    • /
    • pp.259-266
    • /
    • 2012
  • Objectives: This study aims to determine the risk factors predicting rehospitalization by comparing three models and selecting the most successful model. Methods: In order to predict the risk of rehospitalization within 28 days after discharge, 11 951 inpatients were recruited into this study between January and December 2009. Predictive models were constructed with three methods, logistic regression analysis, a decision tree, and a neural network, and the models were compared and evaluated in light of their misclassification rate, root asymptotic standard error, lift chart, and receiver operating characteristic curve. Results: The decision tree was selected as the final model. The risk of rehospitalization was higher when the length of stay (LOS) was less than 2 days, route of admission was through the out-patient department (OPD), medical department was in internal medicine, 10th revision of the International Classification of Diseases code was neoplasm, LOS was relatively shorter, and the frequency of OPD visit was greater. Conclusions: When a patient is to be discharged within 2 days, the appropriateness of discharge should be considered, with special concern of undiscovered complications and co-morbidities. In particular, if the patient is admitted through the OPD, any suspected disease should be appropriately examined and prompt outcomes of tests should be secured. Moreover, for patients of internal medicine practitioners, co-morbidity and complications caused by chronic illness should be given greater attention.

Hierarchy analysis of computationally proposed 100 cases of new digital games based on the expected marketability (컴퓨테이셔널 방법론에 따라 제안된 100가지 미개발 게임 유형들에 대한 기대 시장성 기준의 위계 분석)

  • Kim, Ikhwan
    • Journal of Korea Game Society
    • /
    • v.19 no.5
    • /
    • pp.133-142
    • /
    • 2019
  • In this study, 100 types of computationally proposed digital games were analyzed based on the expected marketability. The game classification methodology with five classification criteria proposed by Kim (2017) and the elimination method leveraged by the Decision Tree have been adopted as the methodology of the study. As a result, digital games could be classified into three groups. With the result, designers in the field will be able to leverage computational design methodology to develop a new type of digital game more efficiently by following the proposed hierarchy.

A Comparative Study on the Accuracy of Important Statistical Prediction Techniques for Marketing Data (마케팅 데이터를 대상으로 중요 통계 예측 기법의 정확성에 대한 비교 연구)

  • Cho, Min-Ho
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.14 no.4
    • /
    • pp.775-780
    • /
    • 2019
  • Techniques for predicting the future can be categorized into statistics-based and deep-run-based techniques. Among them, statistic-based techniques are widely used because simple and highly accurate. However, working-level officials have difficulty using many analytical techniques correctly. In this study, we compared the accuracy of prediction by applying multinomial logistic regression, decision tree, random forest, support vector machine, and Bayesian inference to marketing related data. The same marketing data was used, and analysis was conducted by using R. The prediction results of various techniques reflecting the data characteristics of the marketing field will be a good reference for practitioners.

Sales Pattern and Related Product Attributes of T-shirts (티셔츠 상품의 판매패턴과 연관된 상품속성)

  • Chae, Jin Mie;Kim, Eun Hie
    • Journal of the Korean Society of Clothing and Textiles
    • /
    • v.44 no.6
    • /
    • pp.1053-1069
    • /
    • 2020
  • This study examined the sales pattern relationship with respect to product attributes to propose sales forecasting for fashion products. We analyzed 537 SKU sales data of T-shirts in the domestic sports brand using SAS program. The sales pattern of fashion products fluctuated and were influenced by exogenous factors; therefore, we removed the influence of exogenous factors found to be price discounts and holiday effects as a result of regression analysis. In addition, it was difficult to predict sales using the sales patterns of the same product since fashion products were released as new products every year. Therefore, the forecasting model was proposed using sales patterns of related product attributes when attributes were considered descriptive variables. We classified sales patterns using K-means clustering in order to explain the relationship between sales patterns and product attributes along with creating a decision tree classifier using attributes as input and sales patterns as output. As a result, the sales patterns of T-shirts were clustered into six types that featured the characteristic shape of peak and slope. It was also associated with the combination of product attributes and their values in regards to the proposed sales pattern prediction model.

1D CNN and Machine Learning Methods for Fall Detection (1D CNN과 기계 학습을 사용한 낙상 검출)

  • Kim, Inkyung;Kim, Daehee;Noh, Song;Lee, Jaekoo
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.10 no.3
    • /
    • pp.85-90
    • /
    • 2021
  • In this paper, fall detection using individual wearable devices for older people is considered. To design a low-cost wearable device for reliable fall detection, we present a comprehensive analysis of two representative models. One is a machine learning model composed of a decision tree, random forest, and Support Vector Machine(SVM). The other is a deep learning model relying on a one-dimensional(1D) Convolutional Neural Network(CNN). By considering data segmentation, preprocessing, and feature extraction methods applied to the input data, we also evaluate the considered models' validity. Simulation results verify the efficacy of the deep learning model showing improved overall performance.

Sensitivity analysis of failure correlation between structures, systems, and components on system risk

  • Seunghyun Eem ;Shinyoung Kwag ;In-Kil Choi ;Daegi Hahm
    • Nuclear Engineering and Technology
    • /
    • v.55 no.3
    • /
    • pp.981-988
    • /
    • 2023
  • A seismic event caused an accident at the Fukushima Nuclear Power Plant, which further resulted in simultaneous accidents at several units. Consequently, this incident has aroused great interest in the safety of nuclear power plants worldwide. A reasonable safety evaluation of such an external event should appropriately consider the correlation between SSCs (structures, systems, and components) and the probability of failure. However, a probabilistic safety assessment in current nuclear industries is performed conservatively, assuming that the failure correlation between SSCs is independent or completely dependent. This is an extreme assumption; a reasonable risk can be calculated, or risk-based decision-making can be conducted only when the appropriate failure correlation between SSCs is considered. Thus, this study analyzed the effect of the failure correlation of SSCs on the safety of the system to realize rational safety assessment and decision-making. Consequently, the impact on the system differs according to the size of the failure probability of the SSCs and the AND and OR conditions.

A Comparative Analysis of Risk Assessment Models for Asbestos Demolition (석면 해체 작업의 위험성평가모델 비교 분석)

  • Kim, Dong-Gyu;Kim, Min-Seung;Lee, Su-Min;Kim, Yu-Jin;Han, Seung-Woo
    • Proceedings of the Korean Institute of Building Construction Conference
    • /
    • 2022.11a
    • /
    • pp.99-100
    • /
    • 2022
  • As the danger of exposure to the asbestos has been revealed, the importance of demolition asbestos in existing buildings has been raised. Extensive body of study has been conducted to evaluate the risk of demolition asbestos, but there were confined types of variables caused by not reflecting categorical information and limitations in collecting quantitative information. Thus, this study aims to derive a model that predicts the risk in workplace of demolition asbestos by collecting categorical and continuous variables. For this purpose, categorical and continuous variables were collected from asbestos demolition reports, and the risk assessment score was set as the dependent variable. In this study, the influence of each variable was identified using logistic regression, and the risk prediction model methodologies were compared through decision tree regression and artificial neural network. As a result, a conditional risk prediction model was derived to evaluate the risk of demolition asbestos, and this model is expected to be used to ensure the safety of asbestos demolition workers.

  • PDF

The effect of road weather factors on traffic accident - Focused on Busan area - (도로위의 기상요인이 교통사고에 미치는 영향 - 부산지역을 중심으로 -)

  • Lee, Kyeongjun;Jung, Imgook;Noh, Yunhwan;Yoon, Sanggyeong;Cho, Youngseuk
    • Journal of the Korean Data and Information Science Society
    • /
    • v.26 no.3
    • /
    • pp.661-668
    • /
    • 2015
  • Them traffic accidents have been increased every year due to increasing of vehicles numbers as well as the gravitation of the population. The carelessness of drivers, many road weather factors have a great influence on the traffic accidents. Especially, the number of traffic accident is governed by precipitation, visibility, humidity, cloud amounts and temperature. The purpose of this paper is to analyse the effect of road weather factors on traffic accident. We use the data of traffic accident, AWS weather factors (precipitation, existence of rainfall, temperature, wind speed), time zone and day of the week in 2013. We did statistical analysis using logistic regression analysis and decision tree analysis. These prediction models may be used to predict the traffic accident according to the weather condition.

Developing the high risk group predictive model for student direct loan default using data mining (데이터마이닝을 이용한 학자금 대출 부실 고위험군 예측모형 개발)

  • Choi, Jae-Seok;Han, Jun-Tae;Kim, Myeon-Jung;Jeong, Jina
    • Journal of the Korean Data and Information Science Society
    • /
    • v.26 no.6
    • /
    • pp.1417-1426
    • /
    • 2015
  • We develop the high risk group predictive model for loan default by utilizing the direct loan data from 2012 to 2014 of the Korea Student Aid Foundation. We perform the decision tree analysis using the data mining methodology and use SAS Enterprise Miner 13.2. As a result of this model, subject types were classified into 25 types. This study shows that the major influencing factors for the loan default are household income, national grant, age, overdue record, level of schooling, field of study, monthly repayment. The high risk group predictive model in this study will be the basis for segmented management service for preventing loan default.