• Title/Summary/Keyword: 회귀나무모형

Search Result 110, Processing Time 0.023 seconds

Analysis on Geographical Variations of the Prevalence of Hypertension Using Multi-year Data (다년도 자료를 이용한 고혈압 유병률의 지역간 변이 분석)

  • Kim, Yoomi;Cho, Daegon;Hong, Sungok;Kim, Eunju;Kang, Sunghong
    • Journal of the Korean Geographical Society
    • /
    • v.49 no.6
    • /
    • pp.935-948
    • /
    • 2014
  • As chronic diseases have become more prevalent and problematic, effective cares for major chronic diseases have been a locus of the healthcare policy. In this regard, this study examines how region-specific characteristics affect the prevalence of hypertension in South Korea. To analyze, we combined a unique multi-year data set including key indicators of health conditions and health behaviors at the 237 small administrative districts. The data are collected from the Annual Community Health Survey between 2009 and 2011 by Korea Centers for Disease Control and Prevention and other government organizations. For the purpose of investigating regional variations, we estimated using Geographically Weighted Regression (GWR) and decision tree model. Our finding first suggests that using the multi-year data is more legitimate than using the single-year data for the geographical analysis of chronic diseases, because the significant annual differences are observed in most variables. We also find that the prevalence of hypertension is more likely to be positively associated with the prevalence of diabetes and obesity but to be negatively associated with population density. More importantly, noticeable geographical variations in these factors are observed according to the results from the GWR. In line with this result, additional findings from the decision tree model suggest that primary influential factors that affect the hypertension prevalence are indeed heterogeneous across regional groups. Taken as a whole, accounting for geographical variations of health conditions, health behaviors and other socioeconomic factors is very important when the regionally customized healthcare policy is implemented to mitigate the hypertension prevalence. In short, our study sheds light on possible ways to manage the chronic diseases for policy makers in the local government.

  • PDF

Churn Analysis for the First Successful Candidates in the Entrance Examination for K University

  • Kim, Kyu-Il;Kim, Seung-Han;Kim, Eun-Young;Kim, Hyun;Yang, Jae-Wan;Cho, Jang-Sik
    • Journal of the Korean Data and Information Science Society
    • /
    • v.18 no.1
    • /
    • pp.1-10
    • /
    • 2007
  • In this paper, we focus on churn analysis for the first successful candidates in the entrance examination on 2006 year using Clementine, data mining tool. The goal of this study is to apply decision tree including C5.0 and CART algorithms, neural network and logistic regression techniques to predict a successful candidate churn. And we analyze the churning and nochurning successful candidates and why the successful candidates churn and which successful candidates are most likely to churn in the future using data from entrance examination data of K university on 2006 year.

  • PDF

Development of Traffic Accident Models in Seoul Considering Land Use Characteristics (토지이용특성을 고려한 서울시 교통사고 발생 모형 개발)

  • Lim, Samjin;Park, Juntae
    • Journal of the Society of Disaster Information
    • /
    • v.9 no.1
    • /
    • pp.30-49
    • /
    • 2013
  • In this research we developed a new traffic accident forecasting model on the basis of land use. A new traffic accident forecasting model by type was developed based on market segmentation and further introduction of variables that may reflect characteristics of various regions using Classification and Regression Tree Method. From the results of analysis, activities variables such as the registered population, commuters as well as road size, traffic accidents causing facilities being the subjects of activities were derived as variables explaining traffic accidents.

Particulate Matter Prediction using Quantile Boosting (분위수 부스팅을 이용한 미세먼지 농도 예측)

  • Kwon, Jun-Hyeon;Lim, Yaeji;Oh, Hee-Seok
    • The Korean Journal of Applied Statistics
    • /
    • v.28 no.1
    • /
    • pp.83-92
    • /
    • 2015
  • Concerning the national health, it is important to develop an accurate prediction method of atmospheric particulate matter (PM) because being exposed to such fine dust can trigger not only respiratory diseases as well as dermatoses, ophthalmopathies and cardiovascular diseases. The National Institute of Environmental Research (NIER) employs a decision tree to predict bad weather days with a high PM concentration. However, the decision tree method (even with the inherent unstableness) cannot be a suitable model to predict bad weather days which represent only 4% of the entire data. In this paper, while presenting the inaccuracy and inappropriateness of the method used by the NIER, we present the utility of a new prediction model which adopts boosting with quantile loss functions. We evaluate the performance of the new method over various ${\tau}$-value's and justify the proposed method through comparison.

An Application of Support Vector Machines to Personal Credit Scoring: Focusing on Financial Institutions in China (Support Vector Machines을 이용한 개인신용평가 : 중국 금융기관을 중심으로)

  • Ding, Xuan-Ze;Lee, Young-Chan
    • Journal of Industrial Convergence
    • /
    • v.16 no.4
    • /
    • pp.33-46
    • /
    • 2018
  • Personal credit scoring is an effective tool for banks to properly guide decision profitably on granting loans. Recently, many classification algorithms and models are used in personal credit scoring. Personal credit scoring technology is usually divided into statistical method and non-statistical method. Statistical method includes linear regression, discriminate analysis, logistic regression, and decision tree, etc. Non-statistical method includes linear programming, neural network, genetic algorithm and support vector machine, etc. But for the development of the credit scoring model, there is no consistent conclusion to be drawn regarding which method is the best. In this paper, we will compare the performance of the most common scoring techniques such as logistic regression, neural network, and support vector machines using personal credit data of the financial institution in China. Specifically, we build three models respectively, classify the customers and compare analysis results. According to the results, support vector machine has better performance than logistic regression and neural networks.

Study on Detection for Cochlodinium polykrikoides Red Tide using the GOCI image and Machine Learning Technique (GOCI 영상과 기계학습 기법을 이용한 Cochlodinium polykrikoides 적조 탐지 기법 연구)

  • Unuzaya, Enkhjargal;Bak, Su-Ho;Hwang, Do-Hyun;Jeong, Min-Ji;Kim, Na-Kyeong;Yoon, Hong-Joo
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.15 no.6
    • /
    • pp.1089-1098
    • /
    • 2020
  • In this study, we propose a method to detect red tide Cochlodinium Polykrikoide using by machine learning and geostationary marine satellite images. To learn the machine learning model, GOCI Level 2 data were used, and the red tide location data of the National Fisheries Research and Development Institute was used. The machine learning model used logistic regression model, decision tree model, and random forest model. As a result of the performance evaluation, compared to the traditional GOCI image-based red tide detection algorithm without machine learning (Son et al., 2012) (75%), it was confirmed that the accuracy was improved by about 13~22%p (88~98%). In addition, as a result of comparing and analyzing the detection performance between machine learning models, the random forest model (98%) showed the highest detection accuracy.It is believed that this machine learning-based red tide detection algorithm can be used to detect red tide early in the future and track and monitor its movement and spread.

Power Consumption Forecasting Scheme for Educational Institutions Based on Analysis of Similar Time Series Data (유사 시계열 데이터 분석에 기반을 둔 교육기관의 전력 사용량 예측 기법)

  • Moon, Jihoon;Park, Jinwoong;Han, Sanghoon;Hwang, Eenjun
    • Journal of KIISE
    • /
    • v.44 no.9
    • /
    • pp.954-965
    • /
    • 2017
  • A stable power supply is very important for the maintenance and operation of the power infrastructure. Accurate power consumption prediction is therefore needed. In particular, a university campus is an institution with one of the highest power consumptions and tends to have a wide variation of electrical load depending on time and environment. For this reason, a model that can accurately predict power consumption is required for the effective operation of the power system. The disadvantage of the existing time series prediction technique is that the prediction performance is greatly degraded because the width of the prediction interval increases as the difference between the learning time and the prediction time increases. In this paper, we first classify power data with similar time series patterns considering the date, day of the week, holiday, and semester. Next, each ARIMA model is constructed based on the classified data set and a daily power consumption forecasting method of the university campus is proposed through the time series cross-validation of the predicted time. In order to evaluate the accuracy of the prediction, we confirmed the validity of the proposed method by applying performance indicators.

The effect of road weather factors on traffic accident - Focused on Busan area - (도로위의 기상요인이 교통사고에 미치는 영향 - 부산지역을 중심으로 -)

  • Lee, Kyeongjun;Jung, Imgook;Noh, Yunhwan;Yoon, Sanggyeong;Cho, Youngseuk
    • Journal of the Korean Data and Information Science Society
    • /
    • v.26 no.3
    • /
    • pp.661-668
    • /
    • 2015
  • Them traffic accidents have been increased every year due to increasing of vehicles numbers as well as the gravitation of the population. The carelessness of drivers, many road weather factors have a great influence on the traffic accidents. Especially, the number of traffic accident is governed by precipitation, visibility, humidity, cloud amounts and temperature. The purpose of this paper is to analyse the effect of road weather factors on traffic accident. We use the data of traffic accident, AWS weather factors (precipitation, existence of rainfall, temperature, wind speed), time zone and day of the week in 2013. We did statistical analysis using logistic regression analysis and decision tree analysis. These prediction models may be used to predict the traffic accident according to the weather condition.

Classification Analysis for the Prediction of Underground Cultural Assets (매장문화재 예측을 위한 통계적 분류 분석)

  • Yu, Hye-Kyung;Lee, Jin-Young;Na, Jong-Hwa
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.14 no.3
    • /
    • pp.106-113
    • /
    • 2009
  • Various statistical classification methods have been used to establish prediction model of underground cultural assets in our country. Among them, linear discriminant analysis, logistic regression, decision tree, neural network, and support vector machines are used in this paper. We introduced the basic concepts of above-mentioned classification methods and applied these to the analyses of real data of I city. As a results, five different prediction models are suggested. And also model comparisons are executed by suggesting correct classification rates of the fitted models. To see the applicability of the suggested models for a new data set, simulations are carried out. R packages and programs are used in real data analyses and simulations. Especially, the detailed executing processes by R are provided for the other analyser of related area.

Identification of major risk factors association with respiratory diseases by data mining (데이터마이닝 모형을 활용한 호흡기질환의 주요인 선별)

  • Lee, Jea-Young;Kim, Hyun-Ji
    • Journal of the Korean Data and Information Science Society
    • /
    • v.25 no.2
    • /
    • pp.373-384
    • /
    • 2014
  • Data mining is to clarify pattern or correlation of mass data of complicated structure and to predict the diverse outcomes. This technique is used in the fields of finance, telecommunication, circulation, medicine and so on. In this paper, we selected risk factors of respiratory diseases in the field of medicine. The data we used was divided into respiratory diseases group and health group from the Gyeongsangbuk-do database of Community Health Survey conducted in 2012. In order to select major risk factors, we applied data mining techniques such as neural network, logistic regression, Bayesian network, C5.0 and CART. We divided total data into training and testing data, and applied model which was designed by training data to testing data. By the comparison of prediction accuracy, CART was identified as best model. Depression, smoking and stress were proved as the major risk factors of respiratory disease.