• Title/Summary/Keyword: tree-based models

Search Result 437, Processing Time 0.028 seconds

Enhancing Workers' Job Tenure Using Directions Derived from Data Mining Techniques (데이터 마이닝 기법을 활용한 근로자의 고용유지 강화 방안 개발)

  • An, Minuk;Kim, Taeun;Yoo, Donghee
    • The Journal of the Korea Contents Association
    • /
    • v.18 no.5
    • /
    • pp.265-279
    • /
    • 2018
  • This study conducted an experiment using data mining techniques to develop prediction models of worker job turnover. The experiment used data from the '2015 Graduate Occupational Mobility Survey' by the Korea Employment Information Service. We developed the prediction models using a decision tree, Bayes net, and artificial neural network. We found that the decision tree-based prediction model reported the best accuracy. We also found that the six influential factors affecting employees' turnover intention are type of working time, job status, full-time or not full-time, regular working hours per week, regular working days per week, and personal development opportunities. From the decision tree-based prediction model, we derived 12 rules of employee turnover for all job types. Using the derived rules, we proposed helpful directions for enhancing workers' job tenure. In addition, we analyzed the influential factors affecting employees' job turnover intention according to four job types and derived rules for each: office (ten rules), culture and art (nine rules), construction (four rules), and information technology (six rules). Using the derived rules, we proposed customized directions for improving the job tenure for each group.

Assessing the Effects of Climate Change on the Geographic Distribution of Pinus densiflora in Korea using Ecological Niche Model (소나무의 지리적 분포 및 생태적 지위 모형을 이용한 기후변화 영향 예측)

  • Chun, Jung Hwa;Lee, Chang-Bae
    • Korean Journal of Agricultural and Forest Meteorology
    • /
    • v.15 no.4
    • /
    • pp.219-233
    • /
    • 2013
  • We employed the ecological niche modeling framework using GARP (Genetic Algorithm for Ruleset Production) to model the current and future geographic distribution of Pinus densiflora based on environmental predictor variable datasets such as climate data including the RCP 8.5 emission climate change scenario, geographic and topographic characteristics, soil and geological properties, and MODIS enhanced vegetation index (EVI) at 4 $km^2$ resolution. National Forest Inventory (NFI) derived occurrence and abundance records from about 4,000 survey sites across the whole country were used for response variables. The current and future potential geographic distribution of Pinus densiflora, one of the tree species dominating the present Korean forest was modeled and mapped. Future models under RCP 8.5 scenarios for Pinus densiflora suggest large areas predicted under current climate conditions may be contracted by 2090 showing range shifts northward and to higher altitudes. Area Under Curve (AUC) values of the modeled result was 0.67. Overall, the results of this study were successful in showing the current distribution of major tree species and projecting their future changes. However, there are still many possible limitations and uncertainties arising from the select of the presence-absence data and the environmental predictor variables for model input. Nevertheless, ecological niche modeling can be a useful tool for exploring and mapping the potential response of the tree species to climate change. The final models in this study may be used to identify potential distribution of the tree species based on the future climate scenarios, which can help forest managers to decide where to allocate effort in the management of forest ecosystem under climate change in Korea.

Generating Test Cases of Simulink/Stateflow Model Based on RRT Algorithm Using Heuristic Input Analysis (휴리스틱 입력 분석을 이용한 RRT 기반의 Simulink/Stateflow 모델 테스트 케이스 생성 기법)

  • Park, Hyeon Sang;Choi, Kyung Hee;Chung, Ki Hyun
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.2 no.12
    • /
    • pp.829-840
    • /
    • 2013
  • This paper proposes a modified RRT (Rapidly exploring Random Tree) algorithm utilizing a heuristic input analysis and suggests a test case generation method from Simulink/Stateflow model using the proposed RRT algorithm. Though the typical RRT algorithm is an efficient method to solve the reachability problem to definitely be resolved for generating test cases of model in a black box manner, it has a drawback, an inefficiency of test case generation that comes from generating random inputs without considering the internal states and the test targets of model. The proposed test case generation method increases efficiency of test case generation by analyzing the test targets to be satisfied at the current state and heuristically deciding the inputs of model based on the analysis during expanding an RRT, while maintaining the merit of RRT algorithm. The proposed method is evaluated with the models of ECUs embedded in a commercial passenger's car. The performance is compared with that of the typical RRT algorithm.

A Study on Forecasting Risk of Gas Accident using Weather Data (기상 데이터를 활용한 가스사고위험 예보에 관한 연구)

  • Oh, Jeong Seok
    • Journal of the Korean Institute of Gas
    • /
    • v.22 no.5
    • /
    • pp.107-113
    • /
    • 2018
  • While accident data are used to show alertness to accidents or to review similar cases, the analysis of nature of accident data its association with surrounding environment is very insufficient. Therefore, it is very necessary to demonstrate the possibility of an accident for a particular region by developing analysis techniques with the related accident data. The purpose of this study is to develop an analysis model and implement a system that produces regional accident probability based on historical weather information data and accident and reporting data. In other words, the system is designed and developed to create models by k-NN and decision tree algorithms with optional user-environment variables based on the probability between weather and accidents about many particular region of Korea. In the future, the models developed in this study are intended to be used to analyze and calculate the risk of a more narrow area.

Perceptual Evaluation of Duration Models in Spoken Korean

  • Chung, Hyun-Song
    • Speech Sciences
    • /
    • v.9 no.1
    • /
    • pp.207-215
    • /
    • 2002
  • Perceptual evaluation of duration models of spoken Korean was carried out based on the Classification and Regression Tree (CART) model for text-to-speech conversion. A reference set of durations was produced by a commercial text-to-speech synthesis system for comparison. The duration model which was built in the previous research (Chung & Huckvale, 2001) was applied to a Korean language speech synthesis diphone database, 'Hanmal (HN 1.0)'. The synthetic speech produced by the CART duration model was preferred in the subjective preference test by a small margin and the synthetic speech from the commercial system was superior in the clarity test. In the course of preparing the experiment, a labeled database of spoken Korean with 670 sentences was constructed. As a result of the experiment, a trained duration model for speech synthesis was obtained. The 'Hanmal' diphone database for Korean speech synthesis was also developed as a by-product of the perceptual evaluation.

  • PDF

Dependency Structure Applied to Language Modeling for Information Retrieval

  • Lee, Chang-Ki;Lee, Gary Geun-Bae;Jang, Myung-Gil
    • ETRI Journal
    • /
    • v.28 no.3
    • /
    • pp.337-346
    • /
    • 2006
  • In this paper, we propose a new language model, namely, a dependency structure language model, for information retrieval to compensate for the weaknesses of unigram and bigram language models. The dependency structure language model is based on the first-order dependency model and the dependency parse tree generated by a linguistic parser. So, long-distance dependencies can be naturally captured by the dependency structure language model. We carried out extensive experiments to verify the proposed model, where the dependency structure model gives a better performance than recently proposed language models and the Okapi BM25 method, and the dependency structure is more effective than unigram and bigram in language modeling for information retrieval.

  • PDF

Dynamic Equations of Robots and Sensitivity Analysis (로봇 운동방정식과 감도해석)

  • Song, Sung-Jae;Lee, Jang-Moo
    • Journal of the Korean Society for Precision Engineering
    • /
    • v.12 no.6
    • /
    • pp.105-111
    • /
    • 1995
  • The inverse dynamic equations for 5 link robot including a closed chain have been derived. The closed chain is virtually cut open, and the kinematics and dynamics of the virtual open chain robot are analyzed. The constraints are applied to the virtually cut joints by the Jacobian matrix which represents the configuration of the closed chain. The topology of tree structrued open chain robot is described by a FATHER array. The FATHER array of a link indicates the link tha tis connected in the direction of base link. Based on the inverse dynamic equations, the torque sensitivity models of the 5 link robot have been developed. The sensitivity models characterize the sensitivity of the driving torque with respect to the link parameters. All the procedures are illustrated through the 2 link robot.

  • PDF

A study on Stage-Based Flow Graph Model for Expressing Cyber Attack Train Scenarios (사이버 공격 훈련 시나리오 표현을 위한 Stage 기반 플로우 그래프 모델 연구)

  • Kim, Moon-Sun;Lee, Man-Hee
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.31 no.5
    • /
    • pp.1021-1030
    • /
    • 2021
  • This paper proposes S-CAFG(Stage-based Cyber Attack Flow Graph), a model for effectively describing training scenarios that simulate modern complex cyber attacks. On top of existing graph and tree models, we add a stage node to model more complex scenarios. In order to evaluate the proposed model, we create a complicated scenario and compare how the previous models and S-CAFG express the scenario. As a result, we confirm that S-CAFG can effectively describe various attack scenarios such as simultaneous attacks, additional attacks, and bypass path selection.

Estimation of Chlorophyll Contents in Pear Tree Using Unmanned AerialVehicle-Based-Hyperspectral Imagery (무인기 기반 초분광영상을 이용한 배나무 엽록소 함량 추정)

  • Ye Seong Kang;Ki Su Park;Eun Li Kim;Jong Chan Jeong;Chan Seok Ryu;Jung Gun Cho
    • Korean Journal of Remote Sensing
    • /
    • v.39 no.5_1
    • /
    • pp.669-681
    • /
    • 2023
  • Studies have tried to apply remote sensing technology, a non-destructive survey method, instead of the existing destructive survey, which requires relatively large labor input and a long time to estimate chlorophyll content, which is an important indicator for evaluating the growth of fruit trees. This study was conducted to non-destructively evaluate the chlorophyll content of pear tree leaves using unmanned aerial vehicle-based hyperspectral imagery for two years(2021, 2022). The reflectance of the single bands of the pear tree canopy extracted through image processing was band rationed to minimize unstable radiation effects depending on time changes. The estimation (calibration and validation) models were developed using machine learning algorithms of elastic-net, k-nearest neighbors(KNN), and support vector machine with band ratios as input variables. By comparing the performance of estimation models based on full band ratios, key band ratios that are advantageous for reducing computational costs and improving reproducibility were selected. As a result, for all machine learning models, when calibration of coefficient of determination (R2)≥0.67, root mean squared error (RMSE)≤1.22 ㎍/cm2, relative error (RE)≤17.9% and validation of R2≥0.56, RMSE≤1.41 ㎍/cm2, RE≤20.7% using full band ratios were compared, four key band ratios were selected. There was relatively no significant difference in validation performance between machine learning models. Therefore, the KNN model with the highest calibration performance was used as the standard, and its key band ratios were 710/714, 718/722, 754/758, and 758/762 nm. The performance of calibration showed R2=0.80, RMSE=0.94 ㎍/cm2, RE=13.9%, and validation showed R2=0.57, RMSE=1.40 ㎍/cm2, RE=20.5%. Although the performance results based on validation were not sufficient to estimate the chlorophyll content of pear tree leaves, it is meaningful that key band ratios were selected as a standard for future research. To improve estimation performance, it is necessary to continuously secure additional datasets and improve the estimation model by reproducing it in actual orchards. In future research, it is necessary to continuously secure additional datasets to improve estimation performance, verify the reliability of the selected key band ratios, and upgrade the estimation model to be reproducible in actual orchards.

Decision based uncertainty model to predict rockburst in underground engineering structures using gradient boosting algorithms

  • Kidega, Richard;Ondiaka, Mary Nelima;Maina, Duncan;Jonah, Kiptanui Arap Too;Kamran, Muhammad
    • Geomechanics and Engineering
    • /
    • v.30 no.3
    • /
    • pp.259-272
    • /
    • 2022
  • Rockburst is a dynamic, multivariate, and non-linear phenomenon that occurs in underground mining and civil engineering structures. Predicting rockburst is challenging since conventional models are not standardized. Hence, machine learning techniques would improve the prediction accuracies. This study describes decision based uncertainty models to predict rockburst in underground engineering structures using gradient boosting algorithms (GBM). The model input variables were uniaxial compressive strength (UCS), uniaxial tensile strength (UTS), maximum tangential stress (MTS), excavation depth (D), stress ratio (SR), and brittleness coefficient (BC). Several models were trained using different combinations of the input variables and a 3-fold cross-validation resampling procedure. The hyperparameters comprising learning rate, number of boosting iterations, tree depth, and number of minimum observations were tuned to attain the optimum models. The performance of the models was tested using classification accuracy, Cohen's kappa coefficient (k), sensitivity and specificity. The best-performing model showed a classification accuracy, k, sensitivity and specificity values of 98%, 93%, 1.00 and 0.957 respectively by optimizing model ROC metrics. The most and least influential input variables were MTS and BC, respectively. The partial dependence plots revealed the relationship between the changes in the input variables and model predictions. The findings reveal that GBM can be used to anticipate rockburst and guide decisions about support requirements before mining development.