• 제목/요약/키워드: Predictive decision tree

검색결과 114건 처리시간 0.025초

Iowa Liquor Sales Data Predictive Analysis Using Spark

  • Ankita Paul;Shuvadeep Kundu;Jongwook Woo
    • Asia pacific journal of information systems
    • /
    • 제31권2호
    • /
    • pp.185-196
    • /
    • 2021
  • The paper aims to analyze and predict sales of liquor in the state of Iowa by applying machine learning algorithms to models built for prediction. We have taken recourse of Azure ML and Spark ML for our predictive analysis, which is legacy machine learning (ML) systems and Big Data ML, respectively. We have worked on the Iowa liquor sales dataset comprising of records from 2012 to 2019 in 24 columns and approximately 1.8 million rows. We have concluded by comparing the models with different algorithms applied and their accuracy in predicting the sales using both Azure ML and Spark ML. We find that the Linear Regression model has the highest precision and Decision Forest Regression has the fastest computing time with the sample data set using the legacy Azure ML systems. Decision Tree Regression model in Spark ML has the highest accuracy with the quickest computing time for the entire data set using the Big Data Spark systems.

회귀 모델을 활용한 철강 기업의 에너지 소비 예측 (Forecasting Energy Consumption of Steel Industry Using Regression Model)

  • Sung-Ho KANG;Hyun-Ki KIM
    • Journal of Korea Artificial Intelligence Association
    • /
    • 제1권2호
    • /
    • pp.21-25
    • /
    • 2023
  • The purpose of this study was to compare the performance using multiple regression models to predict the energy consumption of steel industry. Specific independent variables were selected in consideration of correlation among various attributes such as CO2 concentration, NSM, Week Status, Day of week, and Load Type, and preprocessing was performed to solve the multicollinearity problem. In data preprocessing, we evaluated linear and nonlinear relationships between each attribute through correlation analysis. In particular, we decided to select variables with high correlation and include appropriate variables in the final model to prevent multicollinearity problems. Among the many regression models learned, Boosted Decision Tree Regression showed the best predictive performance. Ensemble learning in this model was able to effectively learn complex patterns while preventing overfitting by combining multiple decision trees. Consequently, these predictive models are expected to provide important information for improving energy efficiency and management decision-making at steel industry. In the future, we plan to improve the performance of the model by collecting more data and extending variables, and the application of the model considering interactions with external factors will also be considered.

기계학습 기반의 영화흥행예측 방법 비교: 인공신경망과 의사결정나무를 중심으로 (A Comparison of Predicting Movie Success between Artificial Neural Network and Decision Tree)

  • 권신혜;박경우;장병희
    • 예술인문사회 융합 멀티미디어 논문지
    • /
    • 제7권4호
    • /
    • pp.593-601
    • /
    • 2017
  • 본 연구는 영화산업의 가치사슬단계에 따라 각 단계에서 고려할 수 있는 변인을 활용하여 제작/투자, 배급, 상영단계별 모형을 구성하였다. 모형의 예측력을 높이기 위해 회귀분석으로 유의미한 변인을 도출하여 모형을 추가로 설정하였다. 주어진 변인을 바탕으로 기계학습 분석방법인 인공신경망과 의사결정나무 분석방법 간의 예측력 차이를 비교하였다. 분석 결과, 제작/투자 모형과 배급 모형에서 모든 변인을 투입했을 때는 인공신경망의 정확도가 의사결정나무보다 높았으나, 회귀분석결과에 따라 선정된 변인을 투입하였을 때는 의사결정나무의 정확도가 더 높았다. 상영 모형에서는 회귀분석결과의 반영여부와 관계없이 인공신경망의 정확도가 의사결정나무의 정확도보다 높게 나타났다. 본 논문은 영화흥행 예측연구에 기계학습기법을 적용하여 예측성과가 향상됨을 확인하였다는데 의의가 있다. 선형회귀분석 결과를 기계학습기법에 반영함으로써 기존의 선형적 분석방법의 한계를 극복하고자 하였다.

Decision Tree Model for Predicting Hospice Palliative Care Use in Terminal Cancer Patients

  • Lee, Hee-Ja;Na, Im-Il;Kang, Kyung-Ah
    • Journal of Hospice and Palliative Care
    • /
    • 제24권3호
    • /
    • pp.184-193
    • /
    • 2021
  • Purpose: This study attempted to develop clinical guidelines to help patients use hospice and palliative care (HPC) at an appropriate time after writing physician orders for life-sustaining treatment (POLST) by identifying the characteristics of HPC use of patients with terminal cancer. Methods: This retrospective study was conducted to understand the characteristics of HPC use of patients with terminal cancer through decision tree analysis. The participants were 394 terminal cancer patients who were hospitalized at a cancer-specialized hospital in Seoul, South Korea and wrote POLST from January 1, 2019 to March 31, 2021. Results: The predictive model for the characteristics of HPC use showed three main nodes (living together, pain control, and period to death after writing POLST). The decision tree analysis of HPC use by terminal cancer patients showed that the most likely group to use HPC use was terminal cancer patients who had a cohabitant, received pain control, and died 2 months or more after writing a POLST. The probability of HPC usage rate in this group was 87.5%. The next most likely group to use HPC had a cohabitant and received pain control; 64.8% of this group used HPC. Finally, 55.1% of participants who had a cohabitant used HPC, which was a significantly higher proportion than that of participants who did not have a cohabitant (1.7%). Conclusion: This study provides meaningful clinical evidence to help make decisions on HPC use more easily at an appropriate time.

의사결정나무 분석을 이용한 청소년의 자살 생각 예측 요인 분석: 2019년 아동·청소년 인권실태조사를 중심으로 (Analysis of Predictive Factors for Suicidal Ideation of Adolescents Using Decision Tree Analysis)

  • 한명희
    • 한국보건간호학회지
    • /
    • 제36권2호
    • /
    • pp.157-169
    • /
    • 2022
  • Purpose: This study aimed to implement a model for predicting the presence or absence of suicidal ideation in adolescents by using the decision tree analysis method. Methods: This study is a secondary data analysis using the 2019 Child and Adolescent Human Rights Survey, the most recent data published by the Korea Youth Policy Institute. In order to identify the variables predicting suicidal ideation, a decision tree analysis with suicidal ideation as a dependent variable was performed. Results: This study found that the variables of life satisfaction, insults from parents, sex, and cyber-bullying experience of adolescents were selected as significant predictors of suicidal ideation. It is predicted that 58.2% of subjects with low life satisfaction would think of suicide. Among them, the probability of thinking of suicide increased to 72.7% in the case of unhappy people, and the probability of thinking of suicide in the case of a woman increase to 82.9%. Conclusions: It is necessary to consider family, school, and society environment to prevent suicidal ideation of adolescents.

의사결정나무를 활용한 방산육성지원 수혜기업 결정요인 분석 (An Analysis of the Determinants of Government-Funded Defense Companies using a Decision Tree)

  • 전고운;백슬아;전정환;유동희
    • 한국군사과학기술학회지
    • /
    • 제27권1호
    • /
    • pp.80-93
    • /
    • 2024
  • This study attempted to analyze the factors that influence the participation of beneficiary companies in the government's defense industry promotion support project. To this end, experimental data were analyzed by constructing a prediction model consisting of highly important variables in beneficiary company decisions among various company information using the decision tree model, one of the data mining techniques. In addition, various rules were derived to determine the beneficiary companies of the government's support project using the analysis results expressed as decision trees. Three policy measures were presented based on the important rules that repeatedly appear in different predictive models to increase the effect of the government's industrial development. Using the analysis methods presented in this study and the determinants of the beneficiary companies of the government support project will help create a sustainable future defense industry growth environment.

Analysis of Students Leaving Their Majors Using Decision Tree

  • 박철용;송규문
    • Journal of the Korean Data and Information Science Society
    • /
    • 제13권2호
    • /
    • pp.157-165
    • /
    • 2002
  • Since 1997, when a new educational system that encourages faculties instead of departments in universities is first introduced, students have much more chance to choose and leave their majors than before. As a result, colleges of basic arts and sciences confront with a serious problem since lots of students have left their majors at the colleges. In this paper, we analyze and provide a predictive model for those students in a university using decision trees.

  • PDF

머신러닝 기반 고용량 I-131의 용량 예측 모델에 관한 연구 (A Study on Predictive Modeling of I-131 Radioactivity Based on Machine Learning)

  • 유연욱;이충운;김정수
    • 대한방사선기술학회지:방사선기술과학
    • /
    • 제46권2호
    • /
    • pp.131-139
    • /
    • 2023
  • High-dose I-131 used for the treatment of thyroid cancer causes localized exposure among radiology technologists handling it. There is a delay between the calibration date and when the dose of I-131 is administered to a patient. Therefore, it is necessary to directly measure the radioactivity of the administered dose using a dose calibrator. In this study, we attempted to apply machine learning modeling to measured external dose rates from shielded I-131 in order to predict their radioactivity. External dose rates were measured at 1 m, 0.3 m, and 0.1 m distances from a shielded container with the I-131, with a total of 868 sets of measurements taken. For the modeling process, we utilized the hold-out method to partition the data with a 7:3 ratio (609 for the training set:259 for the test set). For the machine learning algorithms, we chose linear regression, decision tree, random forest and XGBoost. To evaluate the models, we calculated root mean square error (RMSE), mean square error (MSE), and mean absolute error (MAE) to evaluate accuracy and R2 to evaluate explanatory power. Evaluation results are as follows. Linear regression (RMSE 268.15, MSE 71901.87, MAE 231.68, R2 0.92), decision tree (RMSE 108.89, MSE 11856.92, MAE 19.24, R2 0.99), random forest (RMSE 8.89, MSE 79.10, MAE 6.55, R2 0.99), XGBoost (RMSE 10.21, MSE 104.22, MAE 7.68, R2 0.99). The random forest model achieved the highest predictive ability. Improving the model's performance in the future is expected to contribute to lowering exposure among radiology technologists.

의사결정나무 분석기법을 이용한 청소년의 인터넷게임 중독 영향 요인 예측 모형 구축 (A Prediction Model for Internet Game Addiction in Adolescents: Using a Decision Tree Analysis)

  • 김기숙;김경희
    • 대한간호학회지
    • /
    • 제40권3호
    • /
    • pp.378-388
    • /
    • 2010
  • Purpose: This study was designed to build a theoretical frame to provide practical help to prevent and manage adolescent internet game addiction by developing a prediction model through a comprehensive analysis of related factors. Methods: The participants were 1,318 students studying in elementary, middle, and high schools in Seoul and Gyeonggi Province, Korea. Collected data were analyzed using the SPSS program. Decision Tree Analysis using the Clementine program was applied to build an optimum and significant prediction model to predict internet game addiction related to various factors, especially parent related factors. Results: From the data analyses, the prediction model for factors related to internet game addiction presented with 5 pathways. Causative factors included gender, type of school, siblings, economic status, religion, time spent alone, gaming place, payment to Internet cafe$\acute{e}$, frequency, duration, parent's ability to use internet, occupation (mother), trust (father), expectations regarding adolescent's study (mother), supervising (both parents), rearing attitude (both parents). Conclusion: The results suggest preventive and managerial nursing programs for specific groups by path. Use of this predictive model can expand the role of school nurses, not only in counseling addicted adolescents but also, in developing and carrying out programs with parents and approaching adolescents individually through databases and computer programming.

Decision Tree of Occupational Lung Cancer Using Classification and Regression Analysis

  • Kim, Tae-Woo;Koh, Dong-Hee;Park, Chung-Yill
    • Safety and Health at Work
    • /
    • 제1권2호
    • /
    • pp.140-148
    • /
    • 2010
  • Objectives: Determining the work-relatedness of lung cancer developed through occupational exposures is very difficult. Aims of the present study are to develop a decision tree of occupational lung cancer. Methods: 153 cases of lung cancer surveyed by the Occupational Safety and Health Research Institute (OSHRI) from 1992-2007 were included. The target variable was whether the case was approved as work-related lung cancer, and independent variables were age, sex, pack-years of smoking, histological type, type of industry, latency, working period and exposure material in the workplace. The Classification and Regression Test (CART) model was used in searching for predictors of occupational lung cancer. Results: In the CART model, the best predictor was exposure to known lung carcinogens. The second best predictor was 8.6 years or higher latency and the third best predictor was smoking history of less than 11.25 pack-years. The CART model must be used sparingly in deciding the work-relatedness of lung cancer because it is not absolute. Conclusion: We found that exposure to lung carcinogens, latency and smoking history were predictive factors of approval for occupational lung cancer. Further studies for work-relatedness of occupational disease are needed.