• Title/Summary/Keyword: Model Ensemble

Search Result 638, Processing Time 0.022 seconds

Optimizing Hydrological Quantitative Precipitation Forecast (HQPF) based on Machine Learning for Rainfall Impact Forecasting (호우 영향예보를 위한 머신러닝 기반의 수문학적 정량강우예측(HQPF) 최적화 방안)

  • Lee, Han-Su;Jee, Yongkeun;Lee, Young-Mi;Kim, Byung-Sik
    • Journal of Environmental Science International
    • /
    • v.30 no.12
    • /
    • pp.1053-1065
    • /
    • 2021
  • In this study, the prediction technology of Hydrological Quantitative Precipitation Forecast (HQPF) was improved by optimizing the weather predictors used as input data for machine learning. Results comparison was conducted using bias and Root Mean Square Error (RMSE), which are predictive accuracy verification indicators, based on the heavy rain case on August 21, 2021. By comparing the rainfall simulated using the improved HQPF and the observed accumulated rainfall, it was revealed that all HQPFs (conventional HQPF and improved HQPF 1 and HQPF 2) showed a decrease in rainfall as the lead time increased for the entire grid region. Hence, the difference from the observed rainfall increased. In the accumulated rainfall evaluation due to the reduction of input factors, compared to the existing HQPF, improved HQPF 1 and 2 predicted a larger accumulated rainfall. Furthermore, HQPF 2 used the lowest number of input factors and simulated more accumulated rainfall than that projected by conventional HQPF and HQPF 1. By improving the performance of conventional machine learning despite using lesser variables, the preprocessing period and model execution time can be reduced, thereby contributing to model optimization. As an additional advanced method of HQPF 1 and 2 mentioned above, a simulated analysis of the Local ENsemble prediction System (LENS) ensemble member and low pressure, one of the observed meteorological factors, was analyzed. Based on the results of this study, if we select for the positively performing ensemble members based on the heavy rain characteristics of Korea or apply additional weights differently for each ensemble member, the prediction accuracy is expected to increase.

The Development of Ensemble Statistical Prediction Model for Changma Precipitation (장마 강수를 위한 앙상블 통계 예측 모델 개발)

  • Kim, Jin-Yong;Seo, Kyong-Hwan
    • Atmosphere
    • /
    • v.24 no.4
    • /
    • pp.533-540
    • /
    • 2014
  • Statistical forecast models for the prediction of the summertime Changma precipitation have been developed in this study. As effective predictors for the Changma precipitation, the springtime sea surface temperature (SST) anomalies over the North Atlantic (NA1), the North Pacific (NPC) and the tropical Pacific Ocean (CNINO) has been suggested in Lee and Seo (2013). To further improve the performance of the statistical prediction scheme, we select other potential predictors and construct 2 additional statistical models. The selected predictors are the Northern Indian Ocean (NIO) and the Bering Sea (BS) SST anomalies, and the spring Eurasian snow cover anomaly (EUSC). Then, using the total three statistical prediction models, a simple ensemble-mean prediction is performed. The resulting correlation skill score reaches as high as ~0.90 for the last 21 years, which is ~16% increase in the skill compared to the prediction model by Lee and Seo (2013). The EUSC and BS predictors are related to a strengthening of the Okhotsk high, leading to an enhancement of the Changma front. The NIO predictor induces the cyclonic anomalies to the southwest of the Korean peninsula and southeasterly flows toward the peninsula, giving rise to an increase in the Changma precipitation.

Enhancing Autonomous Vehicle RADAR Performance Prediction Model Using Stacking Ensemble (머신러닝 스태킹 앙상블을 이용한 자율주행 자동차 RADAR 성능 향상)

  • Si-yeon Jang;Hye-lim Choi;Yun-ju Oh
    • Journal of Internet Computing and Services
    • /
    • v.25 no.2
    • /
    • pp.21-28
    • /
    • 2024
  • Radar is an essential sensor component in autonomous vehicles, and the market for radar applications in this context is steadily expanding with a growing variety of products. In this study, we aimed to enhance the stability and performance of radar systems by developing and evaluating a radar performance prediction model that can predict radar defects. We selected seven machine learning and deep learning algorithms and trained the model with a total of 49 input data types. Ultimately, when we employed an ensemble of 17 models, it exhibited the highest performance. We anticipate that these research findings will assist in predicting product defects at the production stage, thereby maximizing production yield and minimizing the costs associated with defective products.

Evaluation of conceptual rainfall-runoff models for different flow regimes and development of ensemble model (개념적 강우유출 모형의 유량구간별 적합성 평가 및 앙상블 모델 구축)

  • Yu, Jae-Ung;Park, Moon-Hyung;Kim, Jin-Guk;Kwon, Hyun-Han
    • Journal of Korea Water Resources Association
    • /
    • v.54 no.2
    • /
    • pp.105-119
    • /
    • 2021
  • An increase in the frequency and intensity of both floods and droughts has been recently observed due to an increase in climate variability. Especially, land-use change associated with industrial structure and urbanization has led to an imbalance between water supply and demand, acting as a constraint in water resource management. Accurate rainfall-runoff analysis plays a critical role in evaluating water availability in the water budget analysis. This study aimed to explore various continuous rainfall-runoff models over the Soyanggang dam watershed. Moreover, the ensemble modeling framework combining multiple models was introduced to present scenarios on streamflow considering uncertainties. In the ensemble modeling framework, rainfall-runoff models with fewer parameters are generally preferred for effective regionalization. In this study, more than 40 continuous rainfall-runoff models were applied to the Soyanggang dam watershed, and nine rainfall-runoff models were primarily selected using different goodness-of-fit measures. This study confirmed that the ensemble model showed better performance than the individual model over different flow regimes.

Proper Noun Embedding Model for the Korean Dependency Parsing

  • Nam, Gyu-Hyeon;Lee, Hyun-Young;Kang, Seung-Shik
    • Journal of Multimedia Information System
    • /
    • v.9 no.2
    • /
    • pp.93-102
    • /
    • 2022
  • Dependency parsing is a decision problem of the syntactic relation between words in a sentence. Recently, deep learning models are used for dependency parsing based on the word representations in a continuous vector space. However, it causes a mislabeled tagging problem for the proper nouns that rarely appear in the training corpus because it is difficult to express out-of-vocabulary (OOV) words in a continuous vector space. To solve the OOV problem in dependency parsing, we explored the proper noun embedding method according to the embedding unit. Before representing words in a continuous vector space, we replace the proper nouns with a special token and train them for the contextual features by using the multi-layer bidirectional LSTM. Two models of the syllable-based and morpheme-based unit are proposed for proper noun embedding and the performance of the dependency parsing is more improved in the ensemble model than each syllable and morpheme embedding model. The experimental results showed that our ensemble model improved 1.69%p in UAS and 2.17%p in LAS than the same arc-eager approach-based Malt parser.

Korean Dependency Parsing Using Various Ensemble Models (다양한 앙상블 알고리즘을 이용한 한국어 의존 구문 분석)

  • Jo, Gyeong-Cheol;Kim, Ju-Wan;Kim, Gyun-Yeop;Park, Seong-Jin;Gang, Sang-U
    • Annual Conference on Human and Language Technology
    • /
    • 2019.10a
    • /
    • pp.543-545
    • /
    • 2019
  • 본 논문은 최신 한국어 의존 구문 분석 모델(Korean dependency parsing model)들과 다양한 앙상블 모델(ensemble model)들을 결합하여 그 성능을 분석한다. 단어 표현은 미리 학습된 워드 임베딩 모델(word embedding model)과 ELMo(Embedding from Language Model), Bert(Bidirectional Encoder Representations from Transformer) 그리고 다양한 추가 자질들을 사용한다. 또한 사용된 의존 구문 분석 모델로는 Stack Pointer Network Model, Deep Biaffine Attention Parser와 Left to Right Pointer Parser를 이용한다. 최종적으로 각 모델의 분석 결과를 앙상블 모델인 Bagging 기법과 XGBoost(Extreme Gradient Boosting) 이용하여 최적의 모델을 제안한다.

  • PDF

Machine learning-based corporate default risk prediction model verification and policy recommendation: Focusing on improvement through stacking ensemble model (머신러닝 기반 기업부도위험 예측모델 검증 및 정책적 제언: 스태킹 앙상블 모델을 통한 개선을 중심으로)

  • Eom, Haneul;Kim, Jaeseong;Choi, Sangok
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.2
    • /
    • pp.105-129
    • /
    • 2020
  • This study uses corporate data from 2012 to 2018 when K-IFRS was applied in earnest to predict default risks. The data used in the analysis totaled 10,545 rows, consisting of 160 columns including 38 in the statement of financial position, 26 in the statement of comprehensive income, 11 in the statement of cash flows, and 76 in the index of financial ratios. Unlike most previous prior studies used the default event as the basis for learning about default risk, this study calculated default risk using the market capitalization and stock price volatility of each company based on the Merton model. Through this, it was able to solve the problem of data imbalance due to the scarcity of default events, which had been pointed out as the limitation of the existing methodology, and the problem of reflecting the difference in default risk that exists within ordinary companies. Because learning was conducted only by using corporate information available to unlisted companies, default risks of unlisted companies without stock price information can be appropriately derived. Through this, it can provide stable default risk assessment services to unlisted companies that are difficult to determine proper default risk with traditional credit rating models such as small and medium-sized companies and startups. Although there has been an active study of predicting corporate default risks using machine learning recently, model bias issues exist because most studies are making predictions based on a single model. Stable and reliable valuation methodology is required for the calculation of default risk, given that the entity's default risk information is very widely utilized in the market and the sensitivity to the difference in default risk is high. Also, Strict standards are also required for methods of calculation. The credit rating method stipulated by the Financial Services Commission in the Financial Investment Regulations calls for the preparation of evaluation methods, including verification of the adequacy of evaluation methods, in consideration of past statistical data and experiences on credit ratings and changes in future market conditions. This study allowed the reduction of individual models' bias by utilizing stacking ensemble techniques that synthesize various machine learning models. This allows us to capture complex nonlinear relationships between default risk and various corporate information and maximize the advantages of machine learning-based default risk prediction models that take less time to calculate. To calculate forecasts by sub model to be used as input data for the Stacking Ensemble model, training data were divided into seven pieces, and sub-models were trained in a divided set to produce forecasts. To compare the predictive power of the Stacking Ensemble model, Random Forest, MLP, and CNN models were trained with full training data, then the predictive power of each model was verified on the test set. The analysis showed that the Stacking Ensemble model exceeded the predictive power of the Random Forest model, which had the best performance on a single model. Next, to check for statistically significant differences between the Stacking Ensemble model and the forecasts for each individual model, the Pair between the Stacking Ensemble model and each individual model was constructed. Because the results of the Shapiro-wilk normality test also showed that all Pair did not follow normality, Using the nonparametric method wilcoxon rank sum test, we checked whether the two model forecasts that make up the Pair showed statistically significant differences. The analysis showed that the forecasts of the Staging Ensemble model showed statistically significant differences from those of the MLP model and CNN model. In addition, this study can provide a methodology that allows existing credit rating agencies to apply machine learning-based bankruptcy risk prediction methodologies, given that traditional credit rating models can also be reflected as sub-models to calculate the final default probability. Also, the Stacking Ensemble techniques proposed in this study can help design to meet the requirements of the Financial Investment Business Regulations through the combination of various sub-models. We hope that this research will be used as a resource to increase practical use by overcoming and improving the limitations of existing machine learning-based models.

Comparative characteristic of ensemble machine learning and deep learning models for turbidity prediction in a river (딥러닝과 앙상블 머신러닝 모형의 하천 탁도 예측 특성 비교 연구)

  • Park, Jungsu
    • Journal of Korean Society of Water and Wastewater
    • /
    • v.35 no.1
    • /
    • pp.83-91
    • /
    • 2021
  • The increased turbidity in rivers during flood events has various effects on water environmental management, including drinking water supply systems. Thus, prediction of turbid water is essential for water environmental management. Recently, various advanced machine learning algorithms have been increasingly used in water environmental management. Ensemble machine learning algorithms such as random forest (RF) and gradient boosting decision tree (GBDT) are some of the most popular machine learning algorithms used for water environmental management, along with deep learning algorithms such as recurrent neural networks. In this study GBDT, an ensemble machine learning algorithm, and gated recurrent unit (GRU), a recurrent neural networks algorithm, are used for model development to predict turbidity in a river. The observation frequencies of input data used for the model were 2, 4, 8, 24, 48, 120 and 168 h. The root-mean-square error-observations standard deviation ratio (RSR) of GRU and GBDT ranges between 0.182~0.766 and 0.400~0.683, respectively. Both models show similar prediction accuracy with RSR of 0.682 for GRU and 0.683 for GBDT. The GRU shows better prediction accuracy when the observation frequency is relatively short (i.e., 2, 4, and 8 h) where GBDT shows better prediction accuracy when the observation frequency is relatively long (i.e. 48, 120, 160 h). The results suggest that the characteristics of input data should be considered to develop an appropriate model to predict turbidity.

A Study on Traffic Vulnerable Detection Using Object Detection-Based Ensemble and YOLOv5

  • Hyun-Do Lee;Sun-Gu Kim;Seung-Chae Na;Ji-Yul Ham;Chanhee Kwak
    • Journal of the Korea Society of Computer and Information
    • /
    • v.29 no.1
    • /
    • pp.61-68
    • /
    • 2024
  • Despite the continuous efforts to mitigate pedestrian accidents at crosswalks, the problem persist. Vulnerable groups, including the elderly and disabled individuals are at a risk of being involved in traffic incidents. This paper proposes the implementation of object detection algorithm using the YOLO v5 model specifically for pedestrians using assistive devices like wheelchairs and crutches. For this research, data was collected and utilized through image crawling, Roboflow, and Mobility Aids datasets, which comprise of wheelchair users, crutch users, and pedestrians. Data augmentation techniques were applied to improve the model's generalization performance. Additionally, ensemble techniques were utilized to mitigate type 2 errors, resulting in 96% recall rate. This demonstrates that employing ensemble methods with a single YOLO model to target transportation-disadvantaged individuals can yield accurate detection performance without overlooking crucial objects.

Stochastic Continuous Storage Function Model with Ensemble Kalman Filtering (I) : Model Development (앙상블 칼만필터를 연계한 추계학적 연속형 저류함수모형 (I) : - 모형 개발 -)

  • Bae, Deg-Hyo;Lee, Byong-Ju;Georgakakos, Konstantine P.
    • Journal of Korea Water Resources Association
    • /
    • v.42 no.11
    • /
    • pp.953-961
    • /
    • 2009
  • The objective of this study is to develop a stochastic continuous storage function model for enhancement of an event-oriented watershed and channel storage function models which have been used as an official flood forecast model in Korea. For this study, soil moisture accounting component is added to the original storage function model and each hydrologic component, such as surface flow, subsurface flow, groundwater flow and actual evaportranspiration, is simulated as a function of soil water content. And also, ensemble Kalman filtering technique is used for real-time assimilation of measured streamflow from various stream locations in the watershed. Therefore the enhanced model will be able to simulate hydrologic components for long-term period without additional estimation of model parameters and to give more accurate and reliable results than those from the existing deterministic model due to the assimilation of measured streamflow data.