• Title/Summary/Keyword: 의사결정나무회귀분석

Search Result 124, Processing Time 0.026 seconds

A Study on Estimation of R&D Research Funds by Linear Regression and Decision Tree Analysis (회귀분석 및 의사결정나무 분석을 통한 R&D 연구비 추정에 관한 연구)

  • Kim, Dong-Guen;Cheon, Youngdon;Kim, Sungkyu;Lee, Yoon Been;Hwang, Ji Ho;Kim, Yong Soo
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.35 no.4
    • /
    • pp.73-82
    • /
    • 2012
  • Currently, R&D investment of government is increased dramatically. However, the budget of the government is different depending on the size of ministry and priorities, and then it is difficult to obtain consensus on the budget. They did not establish decision support systems to evaluate and execute R&D budget. In this paper, we analyze factors affecting research funds by linear regression and decision tree analysis in order to increase investment efficiency in national research project. Moreover, we suggested strategies that budget is estimated reasonably.

The Development of Models and the Characteristics for Subway Noise Using the Classification and Regression Trees (CART 분석을 이용한 지하철 소음모형 개발 및 특성 연구)

  • Kim, Tae-Ho;Lee, Jae-Myung;Won, Jai-Mu;Song, In-Suk
    • Journal of the Korean Society for Railway
    • /
    • v.10 no.5
    • /
    • pp.480-486
    • /
    • 2007
  • The subway is a necessary public transportation in big cities, which many citizens are using now. However, the demands for subway inner circumstance by citizens are growing recently. Among them, the noise problem is the hot issue to be solved. So, in this study we classified the characteristics of subway noise using the classification and regression trees (CART) based on noise level data in line No. 5 in Seoul. After that We developed the models for effect of subway noise and analyzed the characteristics through it. The result of this study is that we need to consider the type of geometry design and operational factors when the problem of subway noise improves, because the factors which weigh with subway noise are different by type of geometry and operational part.

Convergence-based analysis on geographical variations of the smoking rates (융복합 기반의 지역간 흡연율의 변이 분석)

  • Lim, Ji-Hye;Kang, Sung-Hong
    • Journal of Digital Convergence
    • /
    • v.13 no.8
    • /
    • pp.375-385
    • /
    • 2015
  • This study aims to identify geographical variations and factors that affect smoking rates. The data are collected from the Community Health Survey conducted between 2009 and 2011 by Korea Centers for Disease Control and Prevention and other government organizations. Correlation and multiple regression analysis were used to examine the factors influencing smoking rates. For the purpose of investigating regional variations, we employed a decision tree model. The study has found that the significant factors associated with geographical variations in the smoking rates were the rate of hazardous drinking, the completion rate of hypertension education, the experience rate of anti-smoking campaigns, stress awareness rate, hypertension prevalence, health insurance cost, diabetes prevalence, obesity rate, and strength training rate. Convergence-based analysis on geographical variations of the smoking rates is highly important when the regionally customized healthcare programs is implemented. In the future, it is necessary to develop effective program and customized approach for the regions of high smoking rates. Our study is expected to be used as meaningful data for the design of effective health care programs and assessments to lead effective non-smoking program.

Prediction of golf scores on the PGA tour using statistical models (PGA 투어의 골프 스코어 예측 및 분석)

  • Lim, Jungeun;Lim, Youngin;Song, Jongwoo
    • The Korean Journal of Applied Statistics
    • /
    • v.30 no.1
    • /
    • pp.41-55
    • /
    • 2017
  • This study predicts the average scores of top 150 PGA golf players on 132 PGA Tour tournaments (2013-2015) using data mining techniques and statistical analysis. This study also aims to predict the Top 10 and Top 25 best players in 4 different playoffs. Linear and nonlinear regression methods were used to predict average scores. Stepwise regression, all best subset, LASSO, ridge regression and principal component regression were used for the linear regression method. Tree, bagging, gradient boosting, neural network, random forests and KNN were used for nonlinear regression method. We found that the average score increases as fairway firmness or green height or average maximum wind speed increases. We also found that the average score decreases as the number of one-putts or scrambling variable or longest driving distance increases. All 11 different models have low prediction error when predicting the average scores of PGA Tournaments in 2015 which is not included in the training set. However, the performances of Bagging and Random Forest models are the best among all models and these two models have the highest prediction accuracy when predicting the Top 10 and Top 25 best players in 4 different playoffs.

Prediction of Carcass Yield by Ultrasound in Hanwoo (초음파 측정에 의한 한우의 도체육량 예측)

  • Rhee, Y. J.;Jeon, K. J.;Choi, S. B.;Seok, H. K.;Kim, S. J.;Lee, S. K.;Song, Y. H.
    • Journal of Animal Science and Technology
    • /
    • v.45 no.2
    • /
    • pp.335-342
    • /
    • 2003
  • This study was conducted to predict the carcass yield traits using ultrasound before slaughter and to enhance the prediction accuracy of carcass yield grade by applying various strategies. For this experiment, five hundred seventy three Hanwoo steers of 24 months of age were used. Difference between ultrasound result and carcass measure of BFT and LMA was 0.6$\pm$1.65mm and 0.7$\pm$5.56cm2, respectively. Correlation coefficient between ultrasound result and carcass measure of BFT and LMA was 0.86 and 0.82, respectively (p<0.001). Results for improving predictions of yield grade by four methods-the Korean yield grade index equation, fat depth alone, regression and decision tree methods were 80.3%, 81.3%, 80.1% and 81.8%, respectively. We conclude that the decision tree method can easily predict yield grade and is also useful for increasing prediction accuracy rate.

The Study on Hypertension Cure Rate Management Centering around Wellness Local Community : With GwangJu as a Central Figure (웰니스 지역사회 중심의 고혈압 치료율 관리 방안에 관한 연구 : 광주광역시 중심으로)

  • Yang, Yu-Jeong;Park, Jong-Ho
    • Journal of Korea Entertainment Industry Association
    • /
    • v.15 no.8
    • /
    • pp.351-361
    • /
    • 2021
  • This study was conducted to identify the factors of hypertension treatment in Gwangju and to establish a hypertension cure rate management plan by using local community health surveys to provide the hypertension cure rate management plan centering around the wellness local community. The research collected 13,714 Gwangju research data among a total of 685,820 local community health surveys of KDCA (Korea Disease Control and Prevention Agency) from 2017 to 2019. Among the data, 2,941 subjects, those with diagnosed hypertension aged over 30, were selected and analyzed through SAS 9.4, SAS Enterprise Miner 15.1. The results are as follows. The differences in hypertension diagnosis cure rate in Gwangju based on the subjects' socioeconomic characteristics were shown in gender, age, marital status, level of educational attainment, economic activity status, and monthly income. The significant differences in hypertension cure rate based on health behavior characteristics were shown in current smoking, monthly alcohol consumption, high-risk drinking, breakfast, recognition of good health level, diabetes and treatment, annual unmet medical needs, and annual health center use. As a result of the logistic regression analysis and interactive decision tree analysis to identify the factors affecting hypertension treatment, the research found that the factors that appear are age, marital status, diabetes and treatment, and annual unmet medical needs. Accordingly, to increase the recognition of the importance of hypertension treatment to people of young ages and not to develop complications, public health-educational effort in Gwangju is needed with an effective preparation plan.

Determinants of student course evaluation using hierarchical linear model (위계적 선형모형을 이용한 강의평가 결정요인 분석)

  • Cho, Jang Sik
    • Journal of the Korean Data and Information Science Society
    • /
    • v.24 no.6
    • /
    • pp.1285-1296
    • /
    • 2013
  • The fundamental concerns of this paper are to analyze the effects of student course evaluation using subject characteristic and student characteristic variables. We use a 2-level hierarchical linear model since the data structure of subject characteristic and student characteristic variables is multilevel. Four models we consider are as follows; (1) null model, (2) random coefficient model, (3) mean as outcomes model, (4) intercepts and slopes as outcomes model. The results of the analysis were given as follows. First, the result of null model was that subject characteristics effects on course evaluation had much larger than student characteristics. Second, the result of conditional model specifying subject and student level predictors revealed that class size, grade, tenure, mean GPA of the class, native class for level-1, and sex, department category, admission method, mean GPA of the student for level-2 had statistically significant effects on course evaluation. The explained variance was 13% in subject level, 13% in student level.

A Convergence Study in the Severity-adjusted Mortality Ratio on inpatients with multiple chronic conditions (복합만성질환 입원환자의 중증도 보정 사망비에 대한 융복합 연구)

  • Seo, Young-Suk;Kang, Sung-Hong
    • Journal of Digital Convergence
    • /
    • v.13 no.12
    • /
    • pp.245-257
    • /
    • 2015
  • This study was to develop the predictive model for severity-adjusted mortality of inpatients with multiple chronic conditions and analyse the factors on the variation of hospital standardized mortality ratio(HSMR) to propose the plan to reduce the variation. We collect the data "Korean National Hospital Discharge In-depth Injury Survey" from 2008 to 2010 and select the final 110,700 objects of study who have chronic diseases for principal diagnosis and who are over the age of 30 with more than 2 chronic diseases including principal diagnosis. We designed a severity-adjusted mortality predictive model with using data-mining methods (logistic regression analysis, decision tree and neural network method). In this study, we used the predictive model for severity-adjusted mortality ratio by the decision tree using Elixhauser comorbidity index. As the result of the hospital standardized mortality ratio(HSMR) of inpatients with multiple chronic conditions, there were statistically significant differences in HSMR by the insurance type, bed number of hospital, and the location of hospital. We should find the method based on the result of this study to manage mortality ratio of inpatients with multiple chronic conditions efficiently as the national level. So we should make an effort to increase the quality of medical treatment for inpatients with multiple chronic diseases and to reduce growing medical expenses.

Prediction of drowning person's route using machine learning for meteorological information of maritime observation buoy

  • Han, Jung-Wook;Moon, Ho-Seok
    • Journal of the Korea Society of Computer and Information
    • /
    • v.27 no.3
    • /
    • pp.1-12
    • /
    • 2022
  • In the event of a maritime distress accident, rapid search and rescue operations using rescue assets are very important to ensure the safety and life of drowning person's at sea. In this paper, we analyzed the surface layer current in the northwest sea area of Ulleungdo by applying machine learning such as multiple linear regression, decision tree, support vector machine, vector autoregression, and LSTM to the meteorological information collected from the maritime observation buoy. And we predicted the drowning person's route at sea based on the predicted current direction and speed information by constructing each prediction model. Comparing the various machine learning models applied in this paper through the performance evaluation measures of MAE and RMSE, the LSTM model is the best. In addition, LSTM model showed superior performance compared to the other models in the view of the difference distance between the actual and predicted movement point of drowning person.

A Study on the Revitalization of Tourism Industry through Big Data Analysis (한국관광 실태조사 빅 데이터 분석을 통한 관광산업 활성화 방안 연구)

  • Lee, Jungmi;Liu, Meina;Lim, Gyoo Gun
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.2
    • /
    • pp.149-169
    • /
    • 2018
  • Korea is currently accumulating a large amount of data in public institutions based on the public data open policy and the "Government 3.0". Especially, a lot of data is accumulated in the tourism field. However, the academic discussions utilizing the tourism data are still limited. Moreover, the openness of the data of restaurants, hotels, and online tourism information, and how to use SNS Big Data in tourism are still limited. Therefore, utilization through tourism big data analysis is still low. In this paper, we tried to analyze influencing factors on foreign tourists' satisfaction in Korea through numerical data using data mining technique and R programming technique. In this study, we tried to find ways to revitalize the tourism industry by analyzing about 36,000 big data of the "Survey on the actual situation of foreign tourists from 2013 to 2015" surveyed by the Korea Culture & Tourism Research Institute. To do this, we analyzed the factors that have high influence on the 'Satisfaction', 'Revisit intention', and 'Recommendation' variables of foreign tourists. Furthermore, we analyzed the practical influences of the variables that are mentioned above. As a procedure of this study, we first integrated survey data of foreign tourists conducted by Korea Culture & Tourism Research Institute, which is stored in the tourist information system from 2013 to 2015, and eliminate unnecessary variables that are inconsistent with the research purpose among the integrated data. Some variables were modified to improve the accuracy of the analysis. And we analyzed the factors affecting the dependent variables by using data-mining methods: decision tree(C5.0, CART, CHAID, QUEST), artificial neural network, and logistic regression analysis of SPSS IBM Modeler 16.0. The seven variables that have the greatest effect on each dependent variable were derived. As a result of data analysis, it was found that seven major variables influencing 'overall satisfaction' were sightseeing spot attraction, food satisfaction, accommodation satisfaction, traffic satisfaction, guide service satisfaction, number of visiting places, and country. Variables that had a great influence appeared food satisfaction and sightseeing spot attraction. The seven variables that had the greatest influence on 'revisit intention' were the country, travel motivation, activity, food satisfaction, best activity, guide service satisfaction and sightseeing spot attraction. The most influential variables were food satisfaction and travel motivation for Korean style. Lastly, the seven variables that have the greatest influence on the 'recommendation intention' were the country, sightseeing spot attraction, number of visiting places, food satisfaction, activity, tour guide service satisfaction and cost. And then the variables that had the greatest influence were the country, sightseeing spot attraction, and food satisfaction. In addition, in order to grasp the influence of each independent variables more deeply, we used R programming to identify the influence of independent variables. As a result, it was found that the food satisfaction and sightseeing spot attraction were higher than other variables in overall satisfaction and had a greater effect than other influential variables. Revisit intention had a higher ${\beta}$ value in the travel motive as the purpose of Korean Wave than other variables. It will be necessary to have a policy that will lead to a substantial revisit of tourists by enhancing tourist attractions for the purpose of Korean Wave. Lastly, the recommendation had the same result of satisfaction as the sightseeing spot attraction and food satisfaction have higher ${\beta}$ value than other variables. From this analysis, we found that 'food satisfaction' and 'sightseeing spot attraction' variables were the common factors to influence three dependent variables that are mentioned above('Overall satisfaction', 'Revisit intention' and 'Recommendation'), and that those factors affected the satisfaction of travel in Korea significantly. The purpose of this study is to examine how to activate foreign tourists in Korea through big data analysis. It is expected to be used as basic data for analyzing tourism data and establishing effective tourism policy. It is expected to be used as a material to establish an activation plan that can contribute to tourism development in Korea in the future.