• Title/Summary/Keyword: ensemble 평균

Search Result 141, Processing Time 0.022 seconds

A Correction of East Asian Summer Precipitation Simulated by PNU/CME CGCM Using Multiple Linear Regression (다중 선형 회귀를 이용한 PNU/CME CGCM의 동아시아 여름철 강수예측 보정 연구)

  • Hwang, Yoon-Jeong;Ahn, Joong-Bae
    • Journal of the Korean earth science society
    • /
    • v.28 no.2
    • /
    • pp.214-226
    • /
    • 2007
  • Because precipitation is influenced by various atmospheric variables, it is highly nonlinear. Although precipitation predicted by a dynamic model can be corrected by using a nonlinear Artificial Neural Network, this approach has limits such as choices of the initial weight, local minima and the number of neurons, etc. In the present paper, we correct simulated precipitation by using a multiple linear regression (MLR) method, which is simple and widely used. First of all, Ensemble hindcast is conducted by the PNU/CME Coupled General Circulation Model (CGCM) (Park and Ahn, 2004) for the period from April to August in 1979-2005. MLR is applied to precipitation simulated by PNU/CME CGCM for the months of June (lead 2), July (lead 3), August (lead 4) and seasonal mean JJA (from June to August) of the Northeast Asian region including the Korean Peninsula $(110^{\circ}-145^{\circ}E,\;25-55^{\circ}N)$. We build the MLR model using a linear relationship between observed precipitation and the hindcasted results from the PNU/CME CGCM. The predictor variables selected from CGCM are precipitation, 500 hPa vertical velocity, 200 hPa divergence, surface air temperature and others. After performing a leave-oneout cross validation, the results are compared with the PNU/CME CGCM's. The results including Heidke skill scores demonstrate that the MLR corrected results have better forecasts than the direct CGCM result for rainfall.

Development of Stochastic Downscaling Method for Rainfall Data Using GCM (GCM Ensemble을 활용한 추계학적 강우자료 상세화 기법 개발)

  • Kim, Tae-Jeong;Kwon, Hyun-Han;Lee, Dong-Ryul;Yoon, Sun-Kwon
    • Journal of Korea Water Resources Association
    • /
    • v.47 no.9
    • /
    • pp.825-838
    • /
    • 2014
  • The stationary Markov chain model has been widely used as a daily rainfall simulation model. A main assumption of the stationary Markov model is that statistical characteristics do not change over time and do not have any trends. In other words, the stationary Markov chain model for daily rainfall simulation essentially can not incorporate any changes in mean or variance into the model. Here we develop a Non-stationary hidden Markov chain model (NHMM) based stochastic downscaling scheme for simulating the daily rainfall sequences, using general circulation models (GCMs) as inputs. It has been acknowledged that GCMs perform well with respect to annual and seasonal variation at large spatial scale and they stand as one of the primary sources for obtaining forecasts. The proposed model is applied to daily rainfall series at three stations in Nakdong watershed. The model showed a better performance in reproducing most of the statistics associated with daily and seasonal rainfall. In particular, the proposed model provided a significant improvement in reproducing the extremes. It was confirmed that the proposed model could be used as a downscaling model for the purpose of generating plausible daily rainfall scenarios if elaborate GCM forecasts can used as a predictor. Also, the proposed NHMM model can be applied to climate change studies if GCM based climate change scenarios are used as inputs.

Bayesian networks-based probabilistic forecasting of hydrological drought considering drought propagation (가뭄의 전이 현상을 고려한 수문학적 가뭄에 대한 베이지안 네트워크 기반 확률 예측)

  • Shin, Ji Yae;Kwon, Hyun-Han;Lee, Joo-Heon;Kim, Tae-Woong
    • Journal of Korea Water Resources Association
    • /
    • v.50 no.11
    • /
    • pp.769-779
    • /
    • 2017
  • As the occurrence of drought is recently on the rise, the reliable drought forecasting is required for developing the drought mitigation and proactive management of water resources. This study developed a probabilistic hydrological drought forecasting method using the Bayesian Networks and drought propagation relationship to estimate future drought with the forecast uncertainty, named as the Propagated Bayesian Networks Drought Forecasting (PBNDF) model. The proposed PBNDF model was composed with 4 nodes of past, current, multi-model ensemble (MME) forecasted information and the drought propagation relationship. Using Palmer Hydrological Drought Index (PHDI), the PBNDF model was applied to forecast the hydrological drought condition at 10 gauging stations in Nakdong River basin. The receiver operating characteristics (ROC) curve analysis was applied to measure the forecast skill of the forecast mean values. The root mean squared error (RMSE) and skill score (SS) were employed to compare the forecast performance with previously developed forecast models (persistence forecast, Bayesian network drought forecast). We found that the forecast skill of PBNDF model showed better performance with low RMSE and high SS of 0.1~0.15. The overall results mean the PBNDF model had good potential in probabilistic drought forecasting.

A Study on the Prediction of Disc Cutter Wear Using TBM Data and Machine Learning Algorithm (TBM 데이터와 머신러닝 기법을 이용한 디스크 커터마모 예측에 관한 연구)

  • Tae-Ho, Kang;Soon-Wook, Choi;Chulho, Lee;Soo-Ho, Chang
    • Tunnel and Underground Space
    • /
    • v.32 no.6
    • /
    • pp.502-517
    • /
    • 2022
  • As the use of TBM increases, research has recently increased to to analyze TBM data with machine learning techniques to predict the exchange cycle of disc cutters, and predict the advance rate of TBM. In this study, a regression prediction of disc cutte wear of slurry shield TBM site was made by combining machine learning based on the machine data and the geotechnical data obtained during the excavation. The data were divided into 7:3 for training and testing the prediction of disc cutter wear, and the hyper-parameters are optimized by cross-validated grid-search over a parameter grid. As a result, gradient boosting based on the ensemble model showed good performance with a determination coefficient of 0.852 and a root-mean-square-error of 3.111 and especially excellent results in fit times along with learning performance. Based on the results, it is judged that the suitability of the prediction model using data including mechanical data and geotechnical information is high. In addition, research is needed to increase the diversity of ground conditions and the amount of disc cutter data.

Corporate Bankruptcy Prediction Model using Explainable AI-based Feature Selection (설명가능 AI 기반의 변수선정을 이용한 기업부실예측모형)

  • Gundoo Moon;Kyoung-jae Kim
    • Journal of Intelligence and Information Systems
    • /
    • v.29 no.2
    • /
    • pp.241-265
    • /
    • 2023
  • A corporate insolvency prediction model serves as a vital tool for objectively monitoring the financial condition of companies. It enables timely warnings, facilitates responsive actions, and supports the formulation of effective management strategies to mitigate bankruptcy risks and enhance performance. Investors and financial institutions utilize default prediction models to minimize financial losses. As the interest in utilizing artificial intelligence (AI) technology for corporate insolvency prediction grows, extensive research has been conducted in this domain. However, there is an increasing demand for explainable AI models in corporate insolvency prediction, emphasizing interpretability and reliability. The SHAP (SHapley Additive exPlanations) technique has gained significant popularity and has demonstrated strong performance in various applications. Nonetheless, it has limitations such as computational cost, processing time, and scalability concerns based on the number of variables. This study introduces a novel approach to variable selection that reduces the number of variables by averaging SHAP values from bootstrapped data subsets instead of using the entire dataset. This technique aims to improve computational efficiency while maintaining excellent predictive performance. To obtain classification results, we aim to train random forest, XGBoost, and C5.0 models using carefully selected variables with high interpretability. The classification accuracy of the ensemble model, generated through soft voting as the goal of high-performance model design, is compared with the individual models. The study leverages data from 1,698 Korean light industrial companies and employs bootstrapping to create distinct data groups. Logistic Regression is employed to calculate SHAP values for each data group, and their averages are computed to derive the final SHAP values. The proposed model enhances interpretability and aims to achieve superior predictive performance.

A Study on the Application of the Price Prediction of Construction Materials through the Improvement of Data Refactor Techniques (Data Refactor 기법의 개선을 통한 건설원자재 가격 예측 적용성 연구)

  • Lee, Woo-Yang;Lee, Dong-Eun;Kim, Byung-Soo
    • Korean Journal of Construction Engineering and Management
    • /
    • v.24 no.6
    • /
    • pp.66-73
    • /
    • 2023
  • The construction industry suffers losses due to failures in demand forecasting due to price fluctuations in construction raw materials, increased user costs due to project cost changes, and lack of forecasting system. Accordingly, it is necessary to improve the accuracy of construction raw material price forecasting. This study aims to predict the price of construction raw materials and verify applicability through the improvement of the Data Refactor technique. In order to improve the accuracy of price prediction of construction raw materials, the existing data refactor classification of low and high frequency and ARIMAX utilization method was improved to frequency-oriented and ARIMA method utilization, so that short-term (3 months in the future) six items such as construction raw materials lumber and cement were improved. ), mid-term (6 months in the future), and long-term (12 months in the future) price forecasts. As a result of the analysis, the predicted value based on the improved Data Refactor technique reduced the error and expanded the variability. Therefore, it is expected that the budget can be managed effectively by predicting the price of construction raw materials more accurately through the Data Refactor technique proposed in this study.

A Machine Learning-Based Encryption Behavior Cognitive Technique for Ransomware Detection (랜섬웨어 탐지를 위한 머신러닝 기반 암호화 행위 감지 기법)

  • Yoon-Cheol Hwang
    • Journal of Industrial Convergence
    • /
    • v.21 no.12
    • /
    • pp.55-62
    • /
    • 2023
  • Recent ransomware attacks employ various techniques and pathways, posing significant challenges in early detection and defense. Consequently, the scale of damage is continually growing. This paper introduces a machine learning-based approach for effective ransomware detection by focusing on file encryption and encryption patterns, which are pivotal functionalities utilized by ransomware. Ransomware is identified by analyzing password behavior and encryption patterns, making it possible to detect specific ransomware variants and new types of ransomware, thereby mitigating ransomware attacks effectively. The proposed machine learning-based encryption behavior detection technique extracts encryption and encryption pattern characteristics and trains them using a machine learning classifier. The final outcome is an ensemble of results from two classifiers. The classifier plays a key role in determining the presence or absence of ransomware, leading to enhanced accuracy. The proposed technique is implemented using the numpy, pandas, and Python's Scikit-Learn library. Evaluation indicators reveal an average accuracy of 94%, precision of 95%, recall rate of 93%, and an F1 score of 95%. These performance results validate the feasibility of ransomware detection through encryption behavior analysis, and further research is encouraged to enhance the technique for proactive ransomware detection.

Changes in Mean Temperature and Warmth Index on the Korean Peninsula under SSP-RCP Climate Change Scenarios (SSP-RCP 기후변화 시나리오 기반 한반도의 평균 기온 및 온량지수 변화)

  • Jina Hur;Yongseok Kim;Sera Jo;Eung-Sup Kim;Mingu Kang;Kyo-Moon Shim;Seung-Gil Hong
    • Atmosphere
    • /
    • v.34 no.2
    • /
    • pp.123-138
    • /
    • 2024
  • Using 18 multi-model-based a Shared Socioeconomic Pathway (SSP) and Representative Concentration Pathways (RCP) climate change scenarios, future changes in temperature and warmth index on the Korean Peninsula in the 21st century (2011~2100) were analyzed. In the analysis of the current climate (1981~2010), the ensemble averaged model results were found to reproduce the observed average values and spatial patterns of temperature and warmth index similarly well. In the future climate projections, temperature and warmth index are expected to rise in the 21st century compared to the current climate. They go further into the future and the higher carbon scenario (SSP5-8.5), the larger the increase. In the 21st century, in the low-carbon scenario (SSP1-2.6), temperature and warmth index are expected to rise by about 2.5℃ and 24.6%, respectively, compared to the present, while in the high-carbon scenario, they are expected to rise by about 6.2℃ and 63.9%, respectively. It was analyzed that reducing carbon emissions could contribute to reducing the increase in temperature and warmth index. The increase in the warmth index due to climate change can be positively analyzed to indicate that the effective heat required for plant growth on the Korean Peninsula will be stably secured. However, it is necessary to comprehensively consider negative aspects such as changes in growth conditions during the plant growth period, increase in extreme weather such as abnormally high temperatures, and decrease in plant diversity. This study can be used as basic scientific information for adapting to climate change and preparing response measures.

Characteristics of Aerodynamic Damping on Helical-Shaped Super Tall Building (나선형 형상의 초고층건물의 공력감쇠의 특성)

  • Kim, Wonsul;Yi, Jin-Hak;Tamura, Yukio
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.37 no.1
    • /
    • pp.9-17
    • /
    • 2017
  • Characteristics of aerodynamic damping ratios of a helical $180^{\circ}$ model which shows better aerodynamic behavior in both along-wind and across-wind responses on a super tall building was investigated by an aeroelastic model test. The aerodynamic damping ratio was evaluated from the wind-induced responses of the model by using Random Decrement (RD) technique. Further, various triggering levels in evaluation of aerodynamic damping ratios using RD technique were also examined. As a result, it was found that when at least 2000 segments were used for evaluating aerodynamic damping ratio for ensemble averaging, the aerodynamic damping ratio can be obtained more consistently with lower irregular fluctuations. This is good agreement with those of previous studies. Another notable observation was that for square and helical $180^{\circ}$ models, the aerodynamic damping ratios in along-wind direction showed similar linear trends with reduced wind speeds regarding of building shapes. On the other hand, for the helical $180^{\circ}$ model, the aerodynamic damping ratio in across-wind direction showed quite different trends with those of the square model. In addition, the aerodynamic damping ratios of the helical $180^{\circ}$ model showed very similar trends with respect to the change of wind direction, and showed gradually increasing trends having small fluctuations with reduced wind speeds. Another observation was that in definition of triggering levels in RD technique on aerodynamic damping ratios, it may be possible to adopt the triggering levels of "standard deviation" or "${\sqrt{2}}$ times of the standard deviation" of the response time history if RD functions have a large number of triggering points. Further, these triggering levels may result in similar values and distributions with reduced wind speeds and either may be acceptable.

A Recidivism Prediction Model Based on XGBoost Considering Asymmetric Error Costs (비대칭 오류 비용을 고려한 XGBoost 기반 재범 예측 모델)

  • Won, Ha-Ram;Shim, Jae-Seung;Ahn, Hyunchul
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.1
    • /
    • pp.127-137
    • /
    • 2019
  • Recidivism prediction has been a subject of constant research by experts since the early 1970s. But it has become more important as committed crimes by recidivist steadily increase. Especially, in the 1990s, after the US and Canada adopted the 'Recidivism Risk Assessment Report' as a decisive criterion during trial and parole screening, research on recidivism prediction became more active. And in the same period, empirical studies on 'Recidivism Factors' were started even at Korea. Even though most recidivism prediction studies have so far focused on factors of recidivism or the accuracy of recidivism prediction, it is important to minimize the prediction misclassification cost, because recidivism prediction has an asymmetric error cost structure. In general, the cost of misrecognizing people who do not cause recidivism to cause recidivism is lower than the cost of incorrectly classifying people who would cause recidivism. Because the former increases only the additional monitoring costs, while the latter increases the amount of social, and economic costs. Therefore, in this paper, we propose an XGBoost(eXtream Gradient Boosting; XGB) based recidivism prediction model considering asymmetric error cost. In the first step of the model, XGB, being recognized as high performance ensemble method in the field of data mining, was applied. And the results of XGB were compared with various prediction models such as LOGIT(logistic regression analysis), DT(decision trees), ANN(artificial neural networks), and SVM(support vector machines). In the next step, the threshold is optimized to minimize the total misclassification cost, which is the weighted average of FNE(False Negative Error) and FPE(False Positive Error). To verify the usefulness of the model, the model was applied to a real recidivism prediction dataset. As a result, it was confirmed that the XGB model not only showed better prediction accuracy than other prediction models but also reduced the cost of misclassification most effectively.