• Title/Summary/Keyword: Ensemble technique

Search Result 212, Processing Time 0.037 seconds

Corporate Bankruptcy Prediction Model using Explainable AI-based Feature Selection (설명가능 AI 기반의 변수선정을 이용한 기업부실예측모형)

  • Gundoo Moon;Kyoung-jae Kim
    • Journal of Intelligence and Information Systems
    • /
    • v.29 no.2
    • /
    • pp.241-265
    • /
    • 2023
  • A corporate insolvency prediction model serves as a vital tool for objectively monitoring the financial condition of companies. It enables timely warnings, facilitates responsive actions, and supports the formulation of effective management strategies to mitigate bankruptcy risks and enhance performance. Investors and financial institutions utilize default prediction models to minimize financial losses. As the interest in utilizing artificial intelligence (AI) technology for corporate insolvency prediction grows, extensive research has been conducted in this domain. However, there is an increasing demand for explainable AI models in corporate insolvency prediction, emphasizing interpretability and reliability. The SHAP (SHapley Additive exPlanations) technique has gained significant popularity and has demonstrated strong performance in various applications. Nonetheless, it has limitations such as computational cost, processing time, and scalability concerns based on the number of variables. This study introduces a novel approach to variable selection that reduces the number of variables by averaging SHAP values from bootstrapped data subsets instead of using the entire dataset. This technique aims to improve computational efficiency while maintaining excellent predictive performance. To obtain classification results, we aim to train random forest, XGBoost, and C5.0 models using carefully selected variables with high interpretability. The classification accuracy of the ensemble model, generated through soft voting as the goal of high-performance model design, is compared with the individual models. The study leverages data from 1,698 Korean light industrial companies and employs bootstrapping to create distinct data groups. Logistic Regression is employed to calculate SHAP values for each data group, and their averages are computed to derive the final SHAP values. The proposed model enhances interpretability and aims to achieve superior predictive performance.

Characteristics of Aerodynamic Damping on Helical-Shaped Super Tall Building (나선형 형상의 초고층건물의 공력감쇠의 특성)

  • Kim, Wonsul;Yi, Jin-Hak;Tamura, Yukio
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.37 no.1
    • /
    • pp.9-17
    • /
    • 2017
  • Characteristics of aerodynamic damping ratios of a helical $180^{\circ}$ model which shows better aerodynamic behavior in both along-wind and across-wind responses on a super tall building was investigated by an aeroelastic model test. The aerodynamic damping ratio was evaluated from the wind-induced responses of the model by using Random Decrement (RD) technique. Further, various triggering levels in evaluation of aerodynamic damping ratios using RD technique were also examined. As a result, it was found that when at least 2000 segments were used for evaluating aerodynamic damping ratio for ensemble averaging, the aerodynamic damping ratio can be obtained more consistently with lower irregular fluctuations. This is good agreement with those of previous studies. Another notable observation was that for square and helical $180^{\circ}$ models, the aerodynamic damping ratios in along-wind direction showed similar linear trends with reduced wind speeds regarding of building shapes. On the other hand, for the helical $180^{\circ}$ model, the aerodynamic damping ratio in across-wind direction showed quite different trends with those of the square model. In addition, the aerodynamic damping ratios of the helical $180^{\circ}$ model showed very similar trends with respect to the change of wind direction, and showed gradually increasing trends having small fluctuations with reduced wind speeds. Another observation was that in definition of triggering levels in RD technique on aerodynamic damping ratios, it may be possible to adopt the triggering levels of "standard deviation" or "${\sqrt{2}}$ times of the standard deviation" of the response time history if RD functions have a large number of triggering points. Further, these triggering levels may result in similar values and distributions with reduced wind speeds and either may be acceptable.

Measurement of Turbulence Properties at the Time of Flow Reversal Under High Wave Conditions in Hujeong Beach (후정해변 고파랑 조건하에서 파랑유속 방향전환점에서 발생하는 난류성분의 측정)

  • Chang, Yeon S.;Do, Jong Dae;Kim, Sun-Sin;Ahn, Kyungmo;Jin, Jae-Youll
    • Journal of Korean Society of Coastal and Ocean Engineers
    • /
    • v.29 no.4
    • /
    • pp.206-216
    • /
    • 2017
  • The temporal distribution of the turbulence kinetic energy (TKE) and the vertical component of Reynolds stresses ($-{\bar{u^{\prime}w^{\prime}}}$) was measured during one wave period under high wave energy conditions. The wave data were obtained at Hujeong Beach in the east coast of Korea at January 14~18 of 2017 when an extratropical cyclone was developed in the East Sea. Among the whole thousands of waves measured during the period, hundreds of regular waves that had with similar pattern were selected for the analysis in order to give three representing mean wave patterns using the ensemble average technique. The turbulence properties were then estimated based on the selected wave data. It is interesting to find out that $-{\bar{u^{\prime}w^{\prime}}}$ has one clear peak near the time of flow reversal while TKE has two peaks at the corresponding times of maximum cross-shore velocity magnitudes. The distinguished pattern of Reynolds stress indicates that vertical fluxes of such properties as suspended sediments may be enhanced at the time when the horizontal flow direction is reversed to disturb the flows, supporting the turbulence convection process proposed by Nielsen (1992). The characteristic patterns of turbulence properties are examined using the CADMAS-SURF Reynolds-Averaged Navier-Stokes (RANS) model. Although the model can reasonably simulate the distribution of TKE pattern, it fails to produce the $-{\bar{u^{\prime}w^{\prime}}}$ peak at the time of flow reversal, which indicates that the application of RANS model is limited in the prediction of some turbulence properties such as Reynolds stresses.

Near-wake Measurements of an Oscillating NACA 0012 Airfoil (진동하는 NACA 0012 에어포일의 근접후류 측정)

  • Kim, Dong-Ha;Kim, Hak-Bong;Jang, Jo-Won
    • Journal of the Korean Society for Aeronautical & Space Sciences
    • /
    • v.34 no.12
    • /
    • pp.1-8
    • /
    • 2006
  • An experimental study was carried out in order to investigate the influence of Reynolds number on the near-wake of an oscillating airfoil. An NACA 0012 airfoil was sinusoidally pitched at the quarter chord point, and is oscillated over a range of instantaneous angles of attack of $\pm$6$^{\circ}$. An X-type hot-wire probe was employed to measure the near-wake of an oscillating airfoil, and the smoke-wire visualization technique was used to examine the flow properties of the boundary layer. The free-stream velocities were 1.98, 2.83 and 4.03 m/s and the corresponding chord Reynolds numbers were 2.3${\times}10^4$, 3.3$\times$104 and 4.8${\times}10^4$, respectively. The frequency of airfoil oscillation was adjusted to fix a reduced frequency of K=0.1. The results show that the properties of the boundary layer and the near-wake can dramatically be distinguished in the range of Reynolds numbers between 2.3${\times}10^4$ and 3.3${\times}10^4$, on the other hand, it is similar in the cases of Re=3.3$\times$104 and 4.8$\times$104. This is caused by that the unsteady separation point is dramatically delayed in case of Re= 2.3${\times}10^4$.

Derivation of Flood Frequency Curve with Uncertainty of Rainfall and Rainfall-Runoff Model (강우 및 강우-유출 모형의 불확실성을 고려한 홍수빈도곡선 유도)

  • Kwon, Hyun-Han;Kim, Jang-Gyeong;Park, Sae-Hoon
    • Journal of Korea Water Resources Association
    • /
    • v.46 no.1
    • /
    • pp.59-71
    • /
    • 2013
  • The lack of sufficient flood data being kept across Korea has made it difficult to assess reliable estimates of the design flood while relatively sufficient rainfall data are available. In this regard, a rainfall simulation based derivation technique of flood frequency curve has been proposed in some of studies. The main issues in deriving the flood frequency curve is to develop the rainfall simulation model that is able to effectively reproduce extreme rainfall. Also the rainfall-runoff modeling that can convey uncertainties associated with model parameters needs to be developed. This study proposes a systematic approach to fully consider rainfallrunoff related uncertainties by coupling a piecewise Kernel-Pareto based multisite daily rainfall generation model and Bayesian HEC-1 model. The proposed model was applied to generate runoff ensemble at Daechung Dam watershed, and the flood frequency curve was successfully derived. It was confirmed that the proposed model is very promising in estimating design floods given a rigorous comparison with existing approaches.

A $2{\times}2$ Microstrip Patch Antenna Array for Moisture Content Measurement of Paddy Rice (산물벼 함수율 측정을 위한 $2{\times}2$ 마이크로스트립 패치 안테나 개발)

  • 김기복;김종헌;노상하
    • Journal of Biosystems Engineering
    • /
    • v.25 no.2
    • /
    • pp.97-106
    • /
    • 2000
  • To develop the grain moisture meter using microwave free space transmission technique, a 10.5GHz microwave signal with the power of 11mW generated by an oscillar with a dielectric resonator is transmitted to an isolator and radiated from a transmitting $2{\times}2$ microstrip patch array antenna into the sample holder filled with the 12 to 26%w.b. of Korean Hwawung paddy rice. the microwave signal, attenuated through the grain with moisture, is collected by a receiving $2{\times}2$ microstrip patch array antenna and detected using a Shottky diode with excellent high frequency characteristic. A pair of light and simple microstrip patch array antenna for measurement of grain moisture content is designed and implemented on atenflon substrate with trleative dielectric constant of 2.6 and thickness of 0.54 by using Ensemble ver. 4.02 software. The aperture of microstrip patch arrays is 41 mm width and 24mm high. The characteristics of microstrip patch antenna such as grain. return loss, and bandwidth are 11.35dBi, -38dB and 0.35GHz($50^{\circ}$ at far-field pattern of E and H plane. The width of the sample holder is large enough to cover the signal between the antennas temperature and bulk density respectively. The calibration model for measurement of grain moisture content is proposed to reduce the effects of fluectuations in bulk density and temperature which give serious errors for the measurements . From the results of regression analysis using the statistically analysis method, the moisture content of grain samples (MC(%)) is expressed in terms of the output voltage(v), temperature (t), and bulk density of samples(${\rho}b$)as follows ;$$MC(%)\;=\;(-3.9838{\times}10^{-8}{\times}v^{3}+8.023{\times}10^{-6}{\times}v^{2}-0.0011{\times}v-0.0004{\times}t+0.1706){\frac{1}{{\rho}b}}{\times}100$ Its determination coefficient, standard error of prediction(SEP) and bias were found to be 0.9855, 0.479%w.b. and -0.0.369 %w.b. respectively between measured and predicted moisture contents of the grain samples.

  • PDF

Flood Inundation Analysis Using OpenMP Technique (OpenMP를 이용한 제내지 침수 병렬해석)

  • PARK, Jae Hong
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2016.05a
    • /
    • pp.74-74
    • /
    • 2016
  • 복잡한 지형에서 컴퓨터를 이용한 물리적 기반 수치모의는 합리적인 시간내에 연산을 완료하기 위해 대개 큰 연산장비 들을 요구한다. 더욱이 모의되는 현상이 시간단계마다 갱신되어지는 동역학적 현상에 기반된 비정상상태일 때 연산성능은 고려되어지는 가장 중요한 주제가 될 수 있다. 연산 시간을 줄이기 위한 가장 널리 이용되는 전략중의 하나는 적절한 수의 프로세서를 이용하는 병렬 기법이다. 최근 들어 연산속도를 가속화하기 위해 다수의 코어를 이용한 OpenMP 와 MPI 기법들이 병렬해석기법으로 대두되었고 그래픽 연산장치를 이용한 병렬처리 해석기법도 소개되고 있다. 본 연구에서는 중앙연산장치를 이용한 병렬 해석기법을 이용하여 제내지 침수해석의 적용성을 검토하고 그 결과을 비교하였다. 본 연구를 위해 OpenMP 병렬기법을 이용하여 확산파 침수해석 프로그램의 원시코드를 재작성하여 가상 및 실제 유역에 적용하였다. 해석결과는 분산메모리 병렬해석 기법인 MPI를 도입한 모형의 결과와 비교되었다. OpenMP를 도입한 모형과 MPI를 도입한 경우 유량 및 수심의 경우 오차 허용 한계내에 수렴되어 만족되었으나 그러나 연산 속도의 경우 두 기법간의 자료의 저장 방법 차이로 인해 차이를 나타내었다. 가상 유역에 적용된 결과로 검토된 각 기법의 증속(speedup) 효과는 MPI의 경우 4 코어를 이용하였을 때 최고 2.62 배 정도에 도달하는 것으로 나타났다. OpenMP 를 적용한 경우 2.87 배 정도로 나타나 OpenMP 를 이용하였을 때 증속효과가 조금 더 뛰어났다. 이는 두 기법의 메모리 저장방식의 차이로 인해 자료의 전송량과 전송 시간이 적은 OpenMP 를 도입한 모형에서 MPI 모형 보다 상대적으로 뛰어난 결과를 나타내었다. 실제 유역의 적용을 위해 상대적으로 우수한 증속결과를 나타낸 OpenMP를 도입한 모형을 Malpasset 댐 붕괴 유역에 적용하였다. 적용된 요소의 수는 각각 45254, 11352 개로 비교적 많은 요소를 가진 하류지역에 적용하여 병렬효과를 극대화하고자 하였다. 적용결과 두 경우 모두 병렬 해석 기법을 도입한 모형에서 유속과 침수심 등은 순차적 모형과 동일한 값을 나타내었으나 증속효과로 인한 연산시간은 순차적 모형에서 8.57 배로 나타나 병렬 모형의 상대적으로 빠른 연산속도를 판단할 있었다. 위의 적용결과를 통해 계산 요소들이 많은 2 차원 해석의 경우 기존의 단일 코어를 이용한 순차적 해석은 장시간에 걸치 연산시간으로 인해 작업효율이 낮아지는 결과를 발생시킬 수 있으며 병렬 해석을 도입할 경우 주어진 컴퓨터 자원를 효율적으로 이용가능하여 합리적인 연산시간으로 연산결과를 얻는 것이 가능하여 반복적 통계 기법/Ensemble 해석 등을 이용한 종합적 해석이 좀 더 실용적으로 이루어 질 수 있을 것이라고 판단되었다.

  • PDF

Use of the Quantitatively Transformed Field Soil Structure Description of the US National Pedon Characterization Database to Improve Soil Pedotransfer Function

  • Yoon, Sung-Won;Gimenez, Daniel;Nemes, Attila;Chun, Hyen-Chung;Zhang, Yong-Seon;Sonn, Yeon-Kyu;Kang, Seong-Soo;Kim, Myung-Sook;Kim, Yoo-Hak;Ha, Sang-Keun
    • Korean Journal of Soil Science and Fertilizer
    • /
    • v.44 no.5
    • /
    • pp.944-958
    • /
    • 2011
  • Soil hydraulic properties such as hydraulic conductivity or water retention which are costly to measure can be indirectly generated by soil pedotransfer function (PTF) using easily obtainable soil data. The field soil structure description which is routinely recorded could also be used in PTF as an input to reduce the uncertainty. The purposes of this study were to use qualitative morphological soil structure descriptions and soil structural index into PTF and to evaluate their contribution in the prediction of soil hydraulic properties. We transformed categorical morphological descriptions of soil structure into quantitative values using categorical principal component analysis (CATPCA). This approach was tested with a large data set from the US National Pedon Characterization database with the aid of a categorical regression tree analysis. Six different PTFs were used to predict the saturated hydraulic conductivity and those results were averaged to quantify the uncertainty. Quantified morphological description was successively used in multiple linear regression approach to predict the averaged ensemble saturated conductivity. The selected stepwise regression model with only the transformed morphological variables and structural index as predictors predicted the $K_{sat}$ with $r^2$ = 0.48 (p = 0.018), indicating the feasibility of CATPCA approach. In a regression tree analysis, soil structure index and soil texture turned out to be important factors in the prediction of the hydraulic properties. Among structural descriptions size class turned out to be an important grouping parameter in the regression tree. Bulk density, clay content, W33 and structural index explained clusters selected by a two step clustering technique, implying the morphologically described soil structural features are closely related to soil physical as well as hydraulic properties. Although this study provided relatively new method which related soil structure description to soil structure index, the same approach should be tested using a datasets containing the actual measurement of hydraulic properties. More insight on the predictive power of soil structure index to estimate hydraulic properties would be achieved by considering measured the saturated hydraulic conductivity and the soil water retention.

A Study on Prediction of EPB shield TBM Advance Rate using Machine Learning Technique and TBM Construction Information (머신러닝 기법과 TBM 시공정보를 활용한 토압식 쉴드TBM 굴진율 예측 연구)

  • Kang, Tae-Ho;Choi, Soon-Wook;Lee, Chulho;Chang, Soo-Ho
    • Tunnel and Underground Space
    • /
    • v.30 no.6
    • /
    • pp.540-550
    • /
    • 2020
  • Machine learning has been actively used in the field of automation due to the development and establishment of AI technology. The important thing in utilizing machine learning is that appropriate algorithms exist depending on data characteristics, and it is needed to analysis the datasets for applying machine learning techniques. In this study, advance rate is predicted using geotechnical and machine data of TBM tunnel section passing through the soil ground below the stream. Although there were no problems of application of statistical technology in the linear regression model, the coefficient of determination was 0.76. While, the ensemble model and support vector machine showed the predicted performance of 0.88 or higher. it is indicating that the model suitable for predicting advance rate of the EPB Shield TBM was the support vector machine in the analyzed dataset. As a result, it is judged that the suitability of the prediction model using data including mechanical data and ground information is high. In addition, research is needed to increase the diversity of ground conditions and the amount of data.

A Study on Customer Review Rating Recommendation and Prediction through Online Promotional Activity Analysis - Focusing on "S" Company Wearable Products - (온라인 판매촉진활동 분석을 통한 고객 리뷰평점 추천 및 예측에 관한 연구 : S사 Wearable 상품중심으로)

  • Shin, Ho-cheol
    • The Journal of the Korea Contents Association
    • /
    • v.22 no.4
    • /
    • pp.118-129
    • /
    • 2022
  • The purpose of this report is to study a strategic model of promotion activities through various analysis and sales forecasting by selecting wearable products for domestic online companies and collecting sales data. For data analysis, various algorithms are used for analysis and the results are selected as the optimal model. The gradation boosting model, which is selected as the best result, will allow nine independent variables to be entered, including promotion type, price, amount, gender, model, company, grade, sales date, and region, when predicting dependent variables through supervised learning. In this study, the review values set as dependent variables for each type of sales promotion were studied in more detail through the ensemble analysis technique, and the main purpose is to analyze and predict them. The purpose of this study is to study the grades. As a result of the analysis, the evaluation result is 95% of AUC, and F1 is about 93%. In the end, it was confirmed that among the types of sales promotion activities, value-added benefits affected the number of reviews and review grades, and that major variables affected the review and review grades.