• Title/Summary/Keyword: tree-based models

Search Result 437, Processing Time 0.022 seconds

Customer Churn Prediction of Automobile Insurance by Multiple Models (다중모델을 이용한 자동차 보험 고객의 이탈예측)

  • LeeS Jae-Sik;Lee Jin-Chun
    • Journal of Intelligence and Information Systems
    • /
    • v.12 no.2
    • /
    • pp.167-183
    • /
    • 2006
  • Since data mining attempts to find unknown facts or rules by dealing with also vaguely-known data sets, it always suffers from high error rate. In order to reduce the error rate, many researchers have employed multiple models in solving a problem. In this research, we present a new type of multiple models, called DyMoS, whose unique feature is that it classifies the input data and applies the different model developed appropriately for each class of data. In order to evaluate the performance of DyMoS, we applied it to a real customer churn problem of an automobile insurance company, The result shows that the DyMoS outperformed any model which employed only one data mining technique such as artificial neural network, decision tree and case-based reasoning.

  • PDF

Predictive of Osteoporosis by Tree-based Machine Learning Model in Post-menopause Woman (폐경 여성에서 트리기반 머신러닝 모델로부터 골다공증 예측)

  • Lee, In-Ja;Lee, Junho
    • Journal of radiological science and technology
    • /
    • v.43 no.6
    • /
    • pp.495-502
    • /
    • 2020
  • In this study, the prevalence of osteoporosis was predicted based on 10 independent variables such as age, weight, and alcohol consumption and 4 tree-based machine-learning models, and the performance of each model was compared. Also the model with the highest performance was used to check the performance by clearing the independent variable, and Area Under Curve(ACU) was utilized to evaluate the performance of the model. The ACU for each model was Decision tree 0.663, Random forest 0.704, GBM 0.702, and XGBoost 0.710 and the importance of the variable was shown in the order of age, weight, and family history. As a result of using XGBoost, the highest performance model and clearing independent variables, the ACU shows the best performance of 0.750 with 7 independent variables. This data suggests that this method be applied to predict osteoporosis, but also other various diseases. In addition, it is expected to be used as basic data for big data research in the health care field.

Pattern Classification Model Design and Performance Comparison for Data Mining of Time Series Data (시계열 자료의 데이터마이닝을 위한 패턴분류 모델설계 및 성능비교)

  • Lee, Soo-Yong;Lee, Kyoung-Joung
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.21 no.6
    • /
    • pp.730-736
    • /
    • 2011
  • In this paper, we designed the models for pattern classification which can reflect the latest trend in time series. It has been shown that fusion models based on statistical and AI methods are superior to traditional ones for the pattern classification model supporting decision making. Especially, the hit rates of pattern classification models combined with fuzzy theory are relatively increased. The statistical SVM models combined with fuzzy membership function, or the models combining neural network and FCM has shown good performance. BPN, PNN, FNN, FCM, SVM, FSVM, Decision Tree, Time Series Analysis, and Regression Analysis were used for pattern classification models in the experiments of this paper. The economical indices DB with time series properties of the financial market(Korea, KOSPI200 DB) and the electrocardiogram DB of arrhythmia patients in hospital emergencies(USA, MIT-BIH DB) were used for data base.

Tree-inspired Chair Modeling (나무 성장 시뮬레이션을 이용한 의자 모델링 기법)

  • Zhang, Qimeng;Byun, Hae Won
    • Journal of the Korea Computer Graphics Society
    • /
    • v.23 no.5
    • /
    • pp.29-38
    • /
    • 2017
  • We propose a method for tree-inspired chair modeling that can generate a tree-branch pattern in the skeleton of an arbitrary chair shape. Unlike existing methods that merge multiple-input models, the proposed method requires only one mesh as input, namely the contour mesh of the user's desired part, to model the chair with a branch pattern generated by tree-growth simulation. We propose a new method for the efficient extraction of the contour-mesh region in the tree-branch pattern. First, we extract the contour mesh based on the face area of the input mesh. We then use the front and back mesh information to generate a skeleton mesh that reconstructs the connection information. In addition, to obtain the tree-branch pattern matching the shape of the input model, we propose a three-way tree-growth simulation method that considers the tangent vector of the shape surface. The proposed method reveals a new type of furniture modeling by using an existing furniture model and simple parameter values to model tree branches shaped appropriately for the input model skeleton. Our experiments demonstrate the performance and effectiveness of the proposed method.

A Study-on Context-Dependent Acoustic Models to Improve the Performance of the Korea Speech Recognition (한국어 음성인식 성능향상을 위한 문맥의존 음향모델에 관한 연구)

  • 황철준;오세진;김범국;정호열;정현열
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.2 no.4
    • /
    • pp.9-15
    • /
    • 2001
  • In this paper we investigate context dependent acoustic models to improve the performance of the Korean speech recognition . The algorithm are using the Korean phonological rules and decision tree, By Successive State Splitting(SSS) algorithm the Hidden Merkov Netwwork(HM-Net) which is an efficient representation of phoneme-context-dependent HMMs, can be generated automatically SSS is powerful technique to design topologies of tied-state HMMs but it doesn't treat unknown contexts in the training phoneme contexts environment adequately In addition it has some problem in the procedure of the contextual domain. In this paper we adopt a new state-clustering algorithm of SSS, called Phonetic Decision Tree-based SSS (PDT-SSS) which includes contexts splits based on the Korean phonological rules. This method combines advantages of both the decision tree clustering and SSS, and can generated highly accurate HM-Net that can express any contexts To verify the effectiveness of the adopted methods. the experiments are carried out using KLE 452 word database and YNU 200 sentence database. Through the Korean phoneme word and sentence recognition experiments. we proved that the new state-clustering algorithm produce better phoneme, word and continuous speech recognition accuracy than the conventional HMMs.

  • PDF

Development and Use of Digital Climate Models in Northern Gyunggi Province - I. Derivation of DCMs from Historical Climate Data and Local Land Surface Features (경기북부지역 정밀 수치기후도 제작 및 활용 - I. 수치기후도 제작)

  • 김성기;박중수;이은섭;장정희;정유란;윤진일
    • Korean Journal of Agricultural and Forest Meteorology
    • /
    • v.6 no.1
    • /
    • pp.49-60
    • /
    • 2004
  • Northern Gyeonggi Province(NGP), consisting of 3 counties, is the northernmost region in South Korea adjacent to the de-militarized zone with North Korea. To supplement insufficient spatial coverage of official climate data and climate atlases based on those data, high-resolution digital climate models(DCM) were prepared to support weather- related activities of residents in NGP Monthly climate data from 51 synoptic stations across both North and South Korea were collected for 1981-2000. A digital elevation model(DEM) for this region with 30m cell spacing was used with the climate data for spatially interpolating daily maximum and minimum temperatures, solar irradiance, and precipitation based on relevant topoclimatological models. For daily minimum temperature, a spatial interpolation scheme accommodating the potential influences of cold air accumulation and the temperature inversion was used. For daily maximum temperature estimation, a spatial interpolation model loaded with the overheating index was used. Daily solar irradiances over sloping surfaces were estimated from nearby synoptic station data weighted by potential relative radiation, which is the hourly sum of relative solar intensity. Precipitation was assumed to increase with the difference between virtual terrain elevation and the DEM multiplied by an observed rate. Validations were carried out by installing an observation network specifically for making comparisons with the spatially estimated temperature pattern. Freezing risk in January was estimated for major fruit tree species based on the DCMs under the recurrence intervals of 10, 30, and 100 years, respectively. Frost risks at bud-burst and blossom of tree flowers were also estimated for the same resolution as the DCMs.

A Study on the Use of Machine Learning Models in Bridge on Slab Thickness Prediction (머신러닝 기법을 활용한 교량데이터 설계 시 슬래브두께 예측에 관한 연구)

  • Chul-Seung Hong;Hyo-Kwan Kim;Se-Hee Lee
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.16 no.5
    • /
    • pp.325-330
    • /
    • 2023
  • This paper proposes to apply machine learning to the process of predicting the slab thickness based on the structural analysis results or experience and subjectivity of engineers in the design of bridge data construction to enable digital-based decision-making. This study aims to build a reliable design environment by utilizing machine learning techniques to provide guide values to engineers in addition to structural analysis for slab thickness selection. Based on girder bridges, which account for the largest proportion of bridge data, a prediction model process for predicting slab thickness among superstructures was defined. Various machine learning models (Linear Regress, Decision Tree, Random Forest, and Muliti-layer Perceptron) were competed for each process to produce the prediction value for each process, and the optimal model was derived. Through this study, the applicability of machine learning techniques was confirmed in areas where slab thickness was predicted only through existing structural analysis, and an accuracy of 95.4% was also obtained. models can be utilized in a more reliable construction environment if the accuracy of the prediction model is improved by expanding the process

Prediction of concrete compressive strength using non-destructive test results

  • Erdal, Hamit;Erdal, Mursel;Simsek, Osman;Erdal, Halil Ibrahim
    • Computers and Concrete
    • /
    • v.21 no.4
    • /
    • pp.407-417
    • /
    • 2018
  • Concrete which is a composite material is one of the most important construction materials. Compressive strength is a commonly used parameter for the assessment of concrete quality. Accurate prediction of concrete compressive strength is an important issue. In this study, we utilized an experimental procedure for the assessment of concrete quality. Firstly, the concrete mix was prepared according to C 20 type concrete, and slump of fresh concrete was about 20 cm. After the placement of fresh concrete to formworks, compaction was achieved using a vibrating screed. After 28 day period, a total of 100 core samples having 75 mm diameter were extracted. On the core samples pulse velocity determination tests and compressive strength tests were performed. Besides, Windsor probe penetration tests and Schmidt hammer tests were also performed. After setting up the data set, twelve artificial intelligence (AI) models compared for predicting the concrete compressive strength. These models can be divided into three categories (i) Functions (i.e., Linear Regression, Simple Linear Regression, Multilayer Perceptron, Support Vector Regression), (ii) Lazy-Learning Algorithms (i.e., IBk Linear NN Search, KStar, Locally Weighted Learning) (iii) Tree-Based Learning Algorithms (i.e., Decision Stump, Model Trees Regression, Random Forest, Random Tree, Reduced Error Pruning Tree). Four evaluation processes, four validation implements (i.e., 10-fold cross validation, 5-fold cross validation, 10% split sample validation & 20% split sample validation) are used to examine the performance of predictive models. This study shows that machine learning regression techniques are promising tools for predicting compressive strength of concrete.

Prediction Models of Mild Cognitive Impairment Using the Korea Longitudinal Study of Ageing (고령화연구패널조사를 이용한 경도인지장애 예측모형)

  • Park, Hyojin;Ha, Juyoung
    • Journal of Korean Academy of Nursing
    • /
    • v.50 no.2
    • /
    • pp.191-199
    • /
    • 2020
  • Purpose: The purpose of this study was to compare sociodemographic characteristics of a normal cognitive group and mild cognitive impairment group, and establish prediction models of Mild Cognitive Impairment (MCI). Methods: This study was a secondary data analysis research using data from "the 4th Korea Longitudinal Study of Ageing" of the Korea Employment Information Service. A total of 6,405 individuals, including 1,329 individuals with MCI and 5,076 individuals with normal cognitive abilities, were part of the study. Based on the panel survey items, the research used 28 variables. The methods of analysis included a χ2-test, logistic regression analysis, decision tree analysis, predicted error rate, and an ROC curve calculated using SPSS 23.0 and SAS 13.2. Results: In the MCI group, the mean age was 71.4 and 65.8% of the participants was women. There were statistically significant differences in gender, age, and education in both groups. Predictors of MCI determined by using a logistic regression analysis were gender, age, education, instrumental activity of daily living (IADL), perceived health status, participation group, cultural activities, and life satisfaction. Decision tree analysis of predictors of MCI identified education, age, life satisfaction, and IADL as predictors. Conclusion: The accuracy of logistic regression model for MCI is slightly higher than that of decision tree model. The implementation of the prediction model for MCI established in this study may be utilized to identify middle-aged and elderly people with risks of MCI. Therefore, this study may contribute to the prevention and reduction of dementia.

Using Data Mining Techniques for Analysis of the Impacts of COVID-19 Pandemic on the Domestic Stock Prices: Focusing on Healthcare Industry (데이터 마이닝 기법을 통한 COVID-19 팬데믹의 국내 주가 영향 분석: 헬스케어산업을 중심으로)

  • Kim, Deok Hyun;Yoo, Dong Hee;Jeong, Dae Yul
    • The Journal of Information Systems
    • /
    • v.30 no.3
    • /
    • pp.21-45
    • /
    • 2021
  • Purpose This paper analyzed the impacts of domestic stock market by a global pandemic such as COVID-19. We investigated how the overall pattern of the stock market changed due to the impact of the COVID-19 pandemic. In particular, we analyzed in depth the pattern of stock price, as well, tried to find what factors affect on stock market index(KOSPI) in the healthcare industry due to the COVID-19 pandemic. Design/methodology/approach We built a data warehouse from the databases in various industrial and economic fields to analyze the changes in the KOSPI due to COVID-19, particularly, the changes in the healthcare industry centered on bio-medicine. We collected daily stock price data of the KOSPI centered on the KOSPI-200 about two years before and one year after the outbreak of COVID-19. In addition, we also collected various news related to COVID-19 from the stock market by applying text mining techniques. We designed four experimental data sets to develop decision tree-based prediction models. Findings All prediction models from the four data sets showed the significant predictive power with explainable decision tree models. In addition, we derived significant 10 to 14 decision rules for each prediction model. The experimental results showed that the decision rules were enough to explain the domestic healthcare stock market patterns for before and after COVID-19.