• Title/Summary/Keyword: tree-based models

Search Result 437, Processing Time 0.159 seconds

Sequence Mining based Manufacturing Process using Decision Model in Cognitive Factory (스마트 공장에서 의사결정 모델을 이용한 순차 마이닝 기반 제조공정)

  • Kim, Joo-Chang;Jung, Hoill;Yoo, Hyun;Chung, Kyungyong
    • Journal of the Korea Convergence Society
    • /
    • v.9 no.3
    • /
    • pp.53-59
    • /
    • 2018
  • In this paper, we propose a sequence mining based manufacturing process using a decision model in cognitive factory. The proposed model is a method to increase the production efficiency by applying the sequence mining decision model in a small scale production process. The data appearing in the production process is composed of the input variables. And the output variable is composed the production rate and the defect rate per hour. We use the GSP algorithm and the REPTree algorithm to generate rules and models using the variables with high significance level through t-test. As a result, the defect rate are improved by 0.38% and the average hourly production rate was increased by 1.89. This has a meaning results for improving the production efficiency through data mining analysis in the small scale production of the cognitive factory.

An Energy-Efficient Self-organizing Hierarchical Sensor Network Model for Vehicle Approach Warning Systems (VAWS) (차량 접근 경고 시스템을 위한 에너지 효율적 자가 구성 센서 네트워크 모델)

  • Shin, Hong-Hyul;Lee, Hyuk-Joon
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.7 no.4
    • /
    • pp.118-129
    • /
    • 2008
  • This paper describes an IEEE 802.15.4-based hierarchical sensor network model for a VAWS(Vehicle Approach Warning System) which provides the drivers of vehicles approaching a sharp turn with the information about vehicles approaching the same turn from the opposite end. In the proposed network model, a tree-structured topology, that can prolong the lifetime of network is formed in a self-organizing manner by a topology control protocol. A simple but efficient routing protocol, that creates and maintains routing tables based on the network topology organized by the topology control protocol, transports data packets generated from the sensor nodes to the base station which then forwards it to a display processor. These protocols are designed as a network layer extension to the IEEE 802.15.4 MAC. In the simulation, which models a scenario with a sharp turn, it is shown that the proposed network model achieves a high-level performance in terms of both energy efficiency and throughput simultaneously.

  • PDF

A study on forecasting attendance rate of reserve forces training based on Data Mining (데이터마이닝에 기반한 예비군훈련 입소율 예측에 관한 연구)

  • Cho, Sangjoon;Ma, Jungmok
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.22 no.3
    • /
    • pp.261-267
    • /
    • 2021
  • The mission of the reserve forces unit is to prepare good training for reserve forces during peacetime. For good training, units require proper organization support agents, but they have difficulties due to a lack of unit members. For that reason, the units forecast the monthly attendance rate of reserve forces (using the x-1 year's result) to organize support agents and unit schedule. On the other hand, the existing planning method can have more errors compared to the actual result of the attendance rate. This problem has a negative effect on the training performance. Therefore, it requires more accurate forecast models to reduce attendance rate errors. This paper proposes an attendance rate forecast model using data mining. To verify the proposed data mining based model, the existing planning method was compared with the proposed model using real data. The results showed that the proposed model outperforms the existing planning method.

A Comparative Study on the Methodology of Failure Detection of Reefer Containers Using PCA and Feature Importance (PCA 및 변수 중요도를 활용한 냉동컨테이너 고장 탐지 방법론 비교 연구)

  • Lee, Seunghyun;Park, Sungho;Lee, Seungjae;Lee, Huiwon;Yu, Sungyeol;Lee, Kangbae
    • Journal of the Korea Convergence Society
    • /
    • v.13 no.3
    • /
    • pp.23-31
    • /
    • 2022
  • This study analyzed the actual frozen container operation data of Starcool provided by H Shipping. Through interviews with H's field experts, only Critical and Fatal Alarms among the four failure alarms were defined as failures, and it was confirmed that using all variables due to the nature of frozen containers resulted in cost inefficiency. Therefore, this study proposes a method for detecting failure of frozen containers through characteristic importance and PCA techniques. To improve the performance of the model, we select variables based on feature importance through tree series models such as XGBoost and LGBoost, and use PCA to reduce the dimension of the entire variables for each model. The boosting-based XGBoost and LGBoost techniques showed that the results of the model proposed in this study improved the reproduction rate by 0.36 and 0.39 respectively compared to the results of supervised learning using all 62 variables.

Machine-assisted Semi-Simulation Model (MSSM): Predicting Galactic Baryonic Properties from Their Dark Matter Using A Machine Trained on Hydrodynamic Simulations

  • Jo, Yongseok;Kim, Ji-hoon
    • The Bulletin of The Korean Astronomical Society
    • /
    • v.44 no.2
    • /
    • pp.55.3-55.3
    • /
    • 2019
  • We present a pipeline to estimate baryonic properties of a galaxy inside a dark matter (DM) halo in DM-only simulations using a machine trained on high-resolution hydrodynamic simulations. As an example, we use the IllustrisTNG hydrodynamic simulation of a (75 h-1 Mpc)3 volume to train our machine to predict e.g., stellar mass and star formation rate in a galaxy-sized halo based purely on its DM content. An extremely randomized tree (ERT) algorithm is used together with multiple novel improvements we introduce here such as a refined error function in machine training and two-stage learning. Aided by these improvements, our model demonstrates a significantly increased accuracy in predicting baryonic properties compared to prior attempts --- in other words, the machine better mimics IllustrisTNG's galaxy-halo correlation. By applying our machine to the MultiDark-Planck DM-only simulation of a large (1 h-1 Gpc)3 volume, we then validate the pipeline that rapidly generates a galaxy catalogue from a DM halo catalogue using the correlations the machine found in IllustrisTNG. We also compare our galaxy catalogue with the ones produced by popular semi-analytic models (SAMs). Our so-called machine-assisted semi-simulation model (MSSM) is shown to be largely compatible with SAMs, and may become a promising method to transplant the baryon physics of galaxy-scale hydrodynamic calculations onto a larger-volume DM-only run. We discuss the benefits that machine-based approaches like this entail, as well as suggestions to raise the scientific potential of such approaches.

  • PDF

Development of a Predictive Model forOccupational Disability Grades Using Workers'Compensation Insurance Data (산재보험 빅데이터를 활용한 장해등급 예측 모델 개발)

  • Choi, Keunho;Kim, Min Jeong;Lee, Jeonghwa
    • The Journal of Information Systems
    • /
    • v.33 no.3
    • /
    • pp.187-205
    • /
    • 2024
  • Purpose A prediction model for occupational injuries can support more proactive, efficient, and effective policy-making. This study aims to develop a model that predicts the severity of occupational injuries, classified into 15 disability grades in South Korea, using machine learning techniques applied to COMWEL data. The primary goal is to improve prediction accuracy, offering an advanced tool for early intervention and evidence-based policy implementation. Design/methodology/approach The data analyzed in this study consists of 290,157 administrative records of occupational injury cases collected between 2018 and 2020 by the Korea Workers' Compensation & Welfare Service, based on the 'Workers' Compensation Insurance Application Form' submitted for occupational injury treatment. Four machine learning models - Decision Tree, DNN, XGBoost, and LightGBM - were developed and their performances compared to identify the optimal model. Additionally, the Permutation Feature Importance (PFI) method was used to assess the relative contribution of each variable to the model's performance, helping to identify key variables. Findings The DNN algorithm achieved the lowest Mean Absolute Error (MAE) of 0.7276. Key variables for predicting disability grades included the severity index, primary disease code, primary disease site, age at the time of the injury, and industry type. These findings highlight the importance of early policy intervention and emphasize the role of both medical and socioeconomic factors in model predictions. The academic and policy implications of these results were also discussed.

Carbon Reduction and Enhancement for Greenspace in Institutional Lands (공공용지 녹지의 탄소저감과 증진방안)

  • Jo, Hyun-Kil;Park, Hye-Mi;Kim, Jin-Young
    • Journal of the Korean Institute of Landscape Architecture
    • /
    • v.48 no.4
    • /
    • pp.1-7
    • /
    • 2020
  • This study quantified annual uptake and storage of carbon by urban greenspace in institutional lands and suggested improvement of greenspace structures to enhance carbon reduction effects. The study selected a total of five study cities including Seoul, Daejeon, Daegu, Chuncheon, and Suncheon, based on areal size and nationwide distribution. Horizontal and vertical greenspace structures were field-surveyed, after institutional greenspace lots were selected using a systematic random sampling method on aerial photographs of the study cities. Annual uptake and storage of carbon by woody plants were computed applying quantitative models of each species developed for urban landscape trees and shrubs. Tree density and stem diameter (at breast height) in institutional lands averaged 1.4±0.1 trees/100 ㎡ and 14.9±0.2 cm across the study cities, respectively. Of the total planted area, the ratio of single-layered planting only with trees, shrubs, or grass was higher than that of multi-layered structures. Annual uptake and storage of carbon per unit area by woody plants averaged 0.65±0.04 t/ha/yr and 7.37±0.47 t/ha, which were lower than those for other greenspace types at home and abroad. This lower carbon reduction was attributed to lower density and smaller size of trees planted in institutional lands studied. Nevertheless, the greenspace in institutional lands annually offset carbon emissions from institutional electricity use by 0.6 (Seoul)~1.9% (Chuncheon). Tree planting in potential planting spaces was estimated to sequester additionally about 18% of the existing annual carbon uptake. Enhancing carbon reduction effects requires active tree planting in the potential spaces, multi-layered/clustered planting composed of the upper trees, middle trees and lower shrubs, planting of tree species with greater carbon uptake capacity, and avoidance of the topiary tree maintenance. This study was focused on finding out greenspace structures and carbon offset levels in institutional lands on which little had been known.

Estimation of Fractional Urban Tree Canopy Cover through Machine Learning Using Optical Satellite Images (기계학습을 이용한 광학 위성 영상 기반의 도시 내 수목 피복률 추정)

  • Sejeong Bae ;Bokyung Son ;Taejun Sung ;Yeonsu Lee ;Jungho Im ;Yoojin Kang
    • Korean Journal of Remote Sensing
    • /
    • v.39 no.5_3
    • /
    • pp.1009-1029
    • /
    • 2023
  • Urban trees play a vital role in urban ecosystems,significantly reducing impervious surfaces and impacting carbon cycling within the city. Although previous research has demonstrated the efficacy of employing artificial intelligence in conjunction with airborne light detection and ranging (LiDAR) data to generate urban tree information, the availability and cost constraints associated with LiDAR data pose limitations. Consequently, this study employed freely accessible, high-resolution multispectral satellite imagery (i.e., Sentinel-2 data) to estimate fractional tree canopy cover (FTC) within the urban confines of Suwon, South Korea, employing machine learning techniques. This study leveraged a median composite image derived from a time series of Sentinel-2 images. In order to account for the diverse land cover found in urban areas, the model incorporated three types of input variables: average (mean) and standard deviation (std) values within a 30-meter grid from 10 m resolution of optical indices from Sentinel-2, and fractional coverage for distinct land cover classes within 30 m grids from the existing level 3 land cover map. Four schemes with different combinations of input variables were compared. Notably, when all three factors (i.e., mean, std, and fractional cover) were used to consider the variation of landcover in urban areas(Scheme 4, S4), the machine learning model exhibited improved performance compared to using only the mean of optical indices (Scheme 1). Of the various models proposed, the random forest (RF) model with S4 demonstrated the most remarkable performance, achieving R2 of 0.8196, and mean absolute error (MAE) of 0.0749, and a root mean squared error (RMSE) of 0.1022. The std variable exhibited the highest impact on model outputs within the heterogeneous land covers based on the variable importance analysis. This trained RF model with S4 was then applied to the entire Suwon region, consistently delivering robust results with an R2 of 0.8702, MAE of 0.0873, and RMSE of 0.1335. The FTC estimation method developed in this study is expected to offer advantages for application in various regions, providing fundamental data for a better understanding of carbon dynamics in urban ecosystems in the future.

Experimental Comparison of Network Intrusion Detection Models Solving Imbalanced Data Problem (데이터의 불균형성을 제거한 네트워크 침입 탐지 모델 비교 분석)

  • Lee, Jong-Hwa;Bang, Jiwon;Kim, Jong-Wouk;Choi, Mi-Jung
    • KNOM Review
    • /
    • v.23 no.2
    • /
    • pp.18-28
    • /
    • 2020
  • With the development of the virtual community, the benefits that IT technology provides to people in fields such as healthcare, industry, communication, and culture are increasing, and the quality of life is also improving. Accordingly, there are various malicious attacks targeting the developed network environment. Firewalls and intrusion detection systems exist to detect these attacks in advance, but there is a limit to detecting malicious attacks that are evolving day by day. In order to solve this problem, intrusion detection research using machine learning is being actively conducted, but false positives and false negatives are occurring due to imbalance of the learning dataset. In this paper, a Random Oversampling method is used to solve the unbalance problem of the UNSW-NB15 dataset used for network intrusion detection. And through experiments, we compared and analyzed the accuracy, precision, recall, F1-score, training and prediction time, and hardware resource consumption of the models. Based on this study using the Random Oversampling method, we develop a more efficient network intrusion detection model study using other methods and high-performance models that can solve the unbalanced data problem.

A study on the prediction of korean NPL market return (한국 NPL시장 수익률 예측에 관한 연구)

  • Lee, Hyeon Su;Jeong, Seung Hwan;Oh, Kyong Joo
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.2
    • /
    • pp.123-139
    • /
    • 2019
  • The Korean NPL market was formed by the government and foreign capital shortly after the 1997 IMF crisis. However, this market is short-lived, as the bad debt has started to increase after the global financial crisis in 2009 due to the real economic recession. NPL has become a major investment in the market in recent years when the domestic capital market's investment capital began to enter the NPL market in earnest. Although the domestic NPL market has received considerable attention due to the overheating of the NPL market in recent years, research on the NPL market has been abrupt since the history of capital market investment in the domestic NPL market is short. In addition, decision-making through more scientific and systematic analysis is required due to the decline in profitability and the price fluctuation due to the fluctuation of the real estate business. In this study, we propose a prediction model that can determine the achievement of the benchmark yield by using the NPL market related data in accordance with the market demand. In order to build the model, we used Korean NPL data from December 2013 to December 2017 for about 4 years. The total number of things data was 2291. As independent variables, only the variables related to the dependent variable were selected for the 11 variables that indicate the characteristics of the real estate. In order to select the variables, one to one t-test and logistic regression stepwise and decision tree were performed. Seven independent variables (purchase year, SPC (Special Purpose Company), municipality, appraisal value, purchase cost, OPB (Outstanding Principle Balance), HP (Holding Period)). The dependent variable is a bivariate variable that indicates whether the benchmark rate is reached. This is because the accuracy of the model predicting the binomial variables is higher than the model predicting the continuous variables, and the accuracy of these models is directly related to the effectiveness of the model. In addition, in the case of a special purpose company, whether or not to purchase the property is the main concern. Therefore, whether or not to achieve a certain level of return is enough to make a decision. For the dependent variable, we constructed and compared the predictive model by calculating the dependent variable by adjusting the numerical value to ascertain whether 12%, which is the standard rate of return used in the industry, is a meaningful reference value. As a result, it was found that the hit ratio average of the predictive model constructed using the dependent variable calculated by the 12% standard rate of return was the best at 64.60%. In order to propose an optimal prediction model based on the determined dependent variables and 7 independent variables, we construct a prediction model by applying the five methodologies of discriminant analysis, logistic regression analysis, decision tree, artificial neural network, and genetic algorithm linear model we tried to compare them. To do this, 10 sets of training data and testing data were extracted using 10 fold validation method. After building the model using this data, the hit ratio of each set was averaged and the performance was compared. As a result, the hit ratio average of prediction models constructed by using discriminant analysis, logistic regression model, decision tree, artificial neural network, and genetic algorithm linear model were 64.40%, 65.12%, 63.54%, 67.40%, and 60.51%, respectively. It was confirmed that the model using the artificial neural network is the best. Through this study, it is proved that it is effective to utilize 7 independent variables and artificial neural network prediction model in the future NPL market. The proposed model predicts that the 12% return of new things will be achieved beforehand, which will help the special purpose companies make investment decisions. Furthermore, we anticipate that the NPL market will be liquidated as the transaction proceeds at an appropriate price.