• Title/Summary/Keyword: 앙상블 방법

Search Result 296, Processing Time 0.026 seconds

Machine learning-based corporate default risk prediction model verification and policy recommendation: Focusing on improvement through stacking ensemble model (머신러닝 기반 기업부도위험 예측모델 검증 및 정책적 제언: 스태킹 앙상블 모델을 통한 개선을 중심으로)

  • Eom, Haneul;Kim, Jaeseong;Choi, Sangok
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.2
    • /
    • pp.105-129
    • /
    • 2020
  • This study uses corporate data from 2012 to 2018 when K-IFRS was applied in earnest to predict default risks. The data used in the analysis totaled 10,545 rows, consisting of 160 columns including 38 in the statement of financial position, 26 in the statement of comprehensive income, 11 in the statement of cash flows, and 76 in the index of financial ratios. Unlike most previous prior studies used the default event as the basis for learning about default risk, this study calculated default risk using the market capitalization and stock price volatility of each company based on the Merton model. Through this, it was able to solve the problem of data imbalance due to the scarcity of default events, which had been pointed out as the limitation of the existing methodology, and the problem of reflecting the difference in default risk that exists within ordinary companies. Because learning was conducted only by using corporate information available to unlisted companies, default risks of unlisted companies without stock price information can be appropriately derived. Through this, it can provide stable default risk assessment services to unlisted companies that are difficult to determine proper default risk with traditional credit rating models such as small and medium-sized companies and startups. Although there has been an active study of predicting corporate default risks using machine learning recently, model bias issues exist because most studies are making predictions based on a single model. Stable and reliable valuation methodology is required for the calculation of default risk, given that the entity's default risk information is very widely utilized in the market and the sensitivity to the difference in default risk is high. Also, Strict standards are also required for methods of calculation. The credit rating method stipulated by the Financial Services Commission in the Financial Investment Regulations calls for the preparation of evaluation methods, including verification of the adequacy of evaluation methods, in consideration of past statistical data and experiences on credit ratings and changes in future market conditions. This study allowed the reduction of individual models' bias by utilizing stacking ensemble techniques that synthesize various machine learning models. This allows us to capture complex nonlinear relationships between default risk and various corporate information and maximize the advantages of machine learning-based default risk prediction models that take less time to calculate. To calculate forecasts by sub model to be used as input data for the Stacking Ensemble model, training data were divided into seven pieces, and sub-models were trained in a divided set to produce forecasts. To compare the predictive power of the Stacking Ensemble model, Random Forest, MLP, and CNN models were trained with full training data, then the predictive power of each model was verified on the test set. The analysis showed that the Stacking Ensemble model exceeded the predictive power of the Random Forest model, which had the best performance on a single model. Next, to check for statistically significant differences between the Stacking Ensemble model and the forecasts for each individual model, the Pair between the Stacking Ensemble model and each individual model was constructed. Because the results of the Shapiro-wilk normality test also showed that all Pair did not follow normality, Using the nonparametric method wilcoxon rank sum test, we checked whether the two model forecasts that make up the Pair showed statistically significant differences. The analysis showed that the forecasts of the Staging Ensemble model showed statistically significant differences from those of the MLP model and CNN model. In addition, this study can provide a methodology that allows existing credit rating agencies to apply machine learning-based bankruptcy risk prediction methodologies, given that traditional credit rating models can also be reflected as sub-models to calculate the final default probability. Also, the Stacking Ensemble techniques proposed in this study can help design to meet the requirements of the Financial Investment Business Regulations through the combination of various sub-models. We hope that this research will be used as a resource to increase practical use by overcoming and improving the limitations of existing machine learning-based models.

A Study on the Optimal Location Selection for Hydrogen Refueling Stations on a Highway using Machine Learning (머신러닝 기반 고속도로 내 수소충전소 최적입지 선정 연구)

  • Jo, Jae-Hyeok;Kim, Sungsu
    • Journal of Cadastre & Land InformatiX
    • /
    • v.51 no.2
    • /
    • pp.83-106
    • /
    • 2021
  • Interests in clean fuels have been soaring because of environmental problems such as air pollution and global warming. Unlike fossil fuels, hydrogen obtains public attention as a eco-friendly energy source because it releases only water when burned. Various policy efforts have been made to establish a hydrogen based transportation network. The station that supplies hydrogen to hydrogen-powered trucks is essential for building the hydrogen based logistics system. Thus, determining the optimal location of refueling stations is an important topic in the network. Although previous studies have mostly applied optimization based methodologies, this paper adopts machine learning to review spatial attributes of candidate locations in selecting the optimal position of the refueling stations. Machine learning shows outstanding performance in various fields. However, it has not yet applied to an optimal location selection problem of hydrogen refueling stations. Therefore, several machine learning models are applied and compared in performance by setting variables relevant to the location of highway rest areas and random points on a highway. The results show that Random Forest model is superior in terms of F1-score. We believe that this work can be a starting point to utilize machine learning based methods as the preliminary review for the optimal sites of the stations before the optimization applies.

A Comparison Study of Ensemble Approach Using WRF/CMAQ Model - The High PM10 Episode in Busan (앙상블 방법에 따른 WRF/CMAQ 수치 모의 결과 비교 연구 - 2013년 부산지역 고농도 PM10 사례)

  • Kim, Taehee;Kim, Yoo-Keun;Shon, Zang-Ho;Jeong, Ju-Hee
    • Journal of Korean Society for Atmospheric Environment
    • /
    • v.32 no.5
    • /
    • pp.513-525
    • /
    • 2016
  • To propose an effective ensemble methods in predicting $PM_{10}$ concentration, six experiments were designed by different ensemble average methods (e.g., non-weighted, single weighted, and cluster weighted methods). The single weighted method was calculated the weighted value using both multiple regression analysis and singular value decomposition and the cluster weighted method was estimated the weighted value based on temperature, relative humidity, and wind component using multiple regression analysis. The effects of ensemble average methods were significantly better in weighted average than non-weight. The results of ensemble experiments using weighted average methods were distinguished according to methods calculating the weighted value. The single weighted average method using multiple regression analysis showed the highest accuracy for hourly $PM_{10}$ concentration, and the cluster weighted average method based on relative humidity showed the highest accuracy for daily mean $PM_{10}$ concentration. However, the result of ensemble spread analysis showed better reliability in the single weighted average method than the cluster weighted average method based on relative humidity. Thus, the single weighted average method was the most effective method in this study case.

Ensemble Method for Predicting Particulate Matter and Odor Intensity (미세먼지, 악취 농도 예측을 위한 앙상블 방법)

  • Lee, Jong-Yeong;Choi, Myoung Jin;Joo, Yeongin;Yang, Jaekyung
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.42 no.4
    • /
    • pp.203-210
    • /
    • 2019
  • Recently, a number of researchers have produced research and reports in order to forecast more exactly air quality such as particulate matter and odor. However, such research mainly focuses on the atmospheric diffusion models that have been used for the air quality prediction in environmental engineering area. Even though it has various merits, it has some limitation in that it uses very limited spatial attributes such as geographical attributes. Thus, we propose the new approach to forecast an air quality using a deep learning based ensemble model combining temporal and spatial predictor. The temporal predictor employs the RNN LSTM and the spatial predictor is based on the geographically weighted regression model. The ensemble model also uses the RNN LSTM that combines two models with stacking structure. The ensemble model is capable of inferring the air quality of the areas without air quality monitoring station, and even forecasting future air quality. We installed the IoT sensors measuring PM2.5, PM10, H2S, NH3, VOC at the 8 stations in Jeonju in order to gather air quality data. The numerical results showed that our new model has very exact prediction capability with comparison to the real measured data. It implies that the spatial attributes should be considered to more exact air quality prediction.

A Study on Injury Severity Prediction for Car-to-Car Traffic Accidents (차대차 교통사고에 대한 상해 심각도 예측 연구)

  • Ko, Changwan;Kim, Hyeonmin;Jeong, Young-Seon;Kim, Jaehee
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.19 no.4
    • /
    • pp.13-29
    • /
    • 2020
  • Automobiles have long been an essential part of daily life, but the social costs of car traffic accidents exceed 9% of the national budget of Korea. Hence, it is necessary to establish prevention and response system for car traffic accidents. In order to present a model that can classify and predict the degree of injury in car traffic accidents, we used big data analysis techniques of K-nearest neighbor, logistic regression analysis, naive bayes classifier, decision tree, and ensemble algorithm. The performances of the models were analyzed by using the data on the nationwide traffic accidents over the past three years. In particular, considering the difference in the number of data among the respective injury severity levels, we used down-sampling methods for the group with a large number of samples to enhance the accuracy of the classification of the models and then verified the statistical significance of the models using ANOVA.

A Molecular Dynamics Simulation Study on Hygroelastic behavior of Thermosetting Epoxy (열경화성 에폭시 기지의 흡습탄성 거동에 관한 분자동역학 전산모사)

  • Kwon, Sunyong;Lee, Man Young;Yang, Seunghwa
    • Composites Research
    • /
    • v.30 no.6
    • /
    • pp.371-378
    • /
    • 2017
  • In this study, hygroelastic behavior of thermosetting epoxy is predicted by molecular dynamics simulations. Since consistent exposures to humid environments lead to macroscopic degradation of polymer composite, computational simulation study of the hygroscopically aged epoxy cell is essential for long-time durability. Therefore, we modeled amorphous epoxy molecular unit cell structures at a crosslinking ratio of 30, 90% and with the moisture weight fraction of 0, 4 wt% respectively. Diglycidyl ether of bisphenol F (EPON862) and triethylenetetramine (TETA) are chosen as resin and curing agent respectively. Incorporating equilibrium and non-equilibrium ensemble simulation with a classical interatomic potential, various hygroelastic properties including diffusion coefficient of water, coefficient of moisture expansion (CME), stress-strain curve and elastic modulus are predicted. To establish the structural property relationship of pure epoxy, free volume and internal non-bond potential energy of epoxy are examined.

A Study on Bagging Neural Network for Predicting Defect Size of Steam Generator Tube in Nuclear Power Plant (원전 증기발생기 세관 결함 크기 예측을 위한 Bagging 신경회로망에 관한 연구)

  • Kim, Kyung-Jin;Jo, Nam-Hoon
    • Journal of the Korean Society for Nondestructive Testing
    • /
    • v.30 no.4
    • /
    • pp.302-310
    • /
    • 2010
  • In this paper, we studied Bagging neural network for predicting defect size of steam generator(SG) tube in nuclear power plant. Bagging is a method for creating an ensemble of estimator based on bootstrap sampling. For predicting defect size of SG tube, we first generated eddy current testing signals for 4 defect patterns of SG tube with various widths and depths. Then, we constructed single neural network(SNN) and Bagging neural network(BNN) to estimate width and depth of each defect. The estimation performance of SNN and BNN were measured by means of peak error. According to our experiment result, average peak error of SNN and BNN for estimating defect depth were 0.117 and 0.089mm, respectively. Also, in the case of estimating defect width, average peak error of SNN and BNN were 0.494 and 0.306mm, respectively. This shows that the estimation performance of BNN is superior to that of SNN.

Deep Learning Forecast model for City-Gas Acceptance Using Extranoues variable (외재적 변수를 이용한 딥러닝 예측 기반의 도시가스 인수량 예측)

  • Kim, Ji-Hyun;Kim, Gee-Eun;Park, Sang-Jun;Park, Woon-Hak
    • Journal of the Korean Institute of Gas
    • /
    • v.23 no.5
    • /
    • pp.52-58
    • /
    • 2019
  • In this study, we have developed a forecasting model for city- gas acceptance. City-gas corporations have to report about city-gas sale volume next year to KOGAS. So it is a important thing to them. Factors influenced city-gas have differences corresponding to usage classification, however, in city-gas acceptence, it is hard to classificate. So we have considered tha outside temperature as factor that influence regardless of usage classification and the model development was carried out. ARIMA, one of the traditional time series analysis, and LSTM, a deep running technique, were used to construct forecasting models, and various Ensemble techniques were used to minimize the disadvantages of these two methods.Experiments and validation were conducted using data from JB Corp. from 2008 to 2018 for 11 years.The average of the error rate of the daily forecast was 0.48% for Ensemble LSTM, the average of the error rate of the monthly forecast was 2.46% for Ensemble LSTM, And the absolute value of the error rate is 5.24% for Ensemble LSTM.

Smarter Classification for Imbalanced Data Set and Its Application to Patent Evaluation (불균형 데이터 집합에 대한 스마트 분류방법과 특허 평가에의 응용)

  • Kwon, Ohbyung;Lee, Jonathan Sangyun
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.1
    • /
    • pp.15-34
    • /
    • 2014
  • Overall, accuracy as a performance measure does not fully consider modular accuracy: the accuracy of classifying 1 (or true) as 1 is not same as classifying 0 (or false) as 0. A smarter classification algorithm would optimize the classification rules to match the modular accuracies' goals according to the nature of problem. Correspondingly, smarter algorithms must be both more generalized with respect to the nature of problems, and free from decretization, which may cause distortion of the real performance. Hence, in this paper, we propose a novel vertical boosting algorithm that improves modular accuracies. Rather than decretizing items, we use simple classifiers such as a regression model that accepts continuous data types. To improve the generalization, and to select a classification model that is well-suited to the nature of the problem domain, we developed a model selection algorithm with smartness. To show the soundness of the proposed method, we performed an experiment with a real-world application: predicting the intellectual properties of e-transaction technology, which had a 47,000+ record data set.

Molecular Simulation Studies for Penetrable-Sphere Model: II. Collision Properties (침투성 구형 모델에 관한 분자 전산 연구: II. 충돌 특성)

  • Kim, Chun-Ho;Suh, Soong-Hyuck
    • Polymer(Korea)
    • /
    • v.35 no.6
    • /
    • pp.513-519
    • /
    • 2011
  • Molecular simulations via the molecular dynamics method have been carried out to investigate the dynamic collision properties of penetrable-sphere model fluids. The collision frequencies, the mean free paths, the angle distributions of the hard-type reflection and the soft-type penetration, and the effective packing fractions are computed over a wide range of the packing fraction ${\phi}$ and the repulsive energy ${\varepsilon}^*$. The soft-type collisions are dominated for lower repulsive energy systems, while the hardtype collisions for higher repulsive energy systems. Very interestingly, the ratio of the soft-type (or, the hard-type) collision frequency to the total collision frequency is directly related with the Boltzmann factor of acceptance (or rejection) probabilities in the canonical ensemble Monte Carlo calculations. Such dynamic collision properties are shown to be restricted for highly repulsive and dense systems of ${\varepsilon}^*{\geqq}3.0 $and ${\phi}{\geqq}0.7$, indicating the cluster forming structures in the penetrable-sphere model.