• Title/Summary/Keyword: Model Ensemble

Search Result 650, Processing Time 0.041 seconds

A Jittering-based Neural Network Ensemble Approach for Regionalized Low-flow Frequency Analysis

  • Ahn, Kuk-Hyun
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2020.06a
    • /
    • pp.382-382
    • /
    • 2020
  • 과거 많은 연구에서 다수의 모형의 결과를 이용한 앙상블 방법론은 인공지능 모형 (artificial neural network)의 예측 능력에 향상을 갖고 온다 논하였다. 본 연구에서는 미계측유역의 저수량(low flow)의 예측을 위하여 Jittering을 기반으로 한 인공지능 모형을 제시하고자 한다. 기본적인 방법론은 설명변수들에게 백색 잡음(white noise)를 삽입하여 훈련되는 자료를 증가시키는 것이다. Jittering을 기반으로 한 인공지능 모형에 대한 효과를 검증하기 위하여 본 연구에서는 Multi-output neural network model을 기반으로 모형을 구축하였다. 다음으로 Jittering을 기반으로 한 앙상블 모형을 variable importance measuring algorithm과 결합시켜서 유역특성치와 예측되는 저수량의 특성치들의 관계를 추론하였다. 본 연구에서 사용되는 방법론들의 효용성을 평가하기 위해서 미동북부에 위치하고 있는 총 207개의 유역을 사용하였다. 결과적으로 본 연구에서 제시한 Jittering을 기반으로 한 인공지능 앙상블 모형은 단일예측모형 (single modeling approach)을 정확도 측면에서 우수한 것으로 확인되었다. 또한, 적은 숫자의 앙상블 모형에서도 그 정확성이 단일예측모형보다 우수한 것을 확인하였다. 마지막으로 본 연구에서는 유역특성치들의 효과가 살펴보고자 하는 저수량의 특성치들에 따라서 일관적으로 영향을 미치거나 그 중요도가 변화하는 것을 확인하였다.

  • PDF

Length-of-Stay Prediction Model of Appendicitis using Artificial Neural Networks and Decision Tree (신경망과 의사결정 나무를 이용한 충수돌기염 환자의 재원일수 예측모형 개발)

  • Chung, Suk-Hoon;Han, Woo-Sok;Suh, Yong-Moo;Rhee, Hyun-SiIl
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.10 no.6
    • /
    • pp.1424-1432
    • /
    • 2009
  • For the efficient management of hospital sickbeds, it is important to predict the length of stay (LoS) of appendicitis patients. This study analyzed the patient data to find factors that show high positive correlation with LoS, build LoS prediction models using neural network and decision tree models, and compare their performance. In order to increase the prediction accuracy, we applied the ensemble techniques such as bagging and boosting. Experimental results show that decision tree model which was built with less number of variables shows prediction accuracy almost equal to that of neural network model, and that bagging is better than boosting. In conclusion, since the decision tree model which provides better explanation than neural network model can well predict the LoS of appendicitis patients and can also be used to select the input variables, it is recommended that hospitals make use of the decision tree techniques more actively.

Estimating Korean Pine(Pinus koraiensis) Habitat Distribution Considering Climate Change Uncertainty - Using Species Distribution Models and RCP Scenarios - (불확실성을 고려한 미래 잣나무의 서식 적지 분포 예측 - 종 분포 모형과 RCP시나리오를 중심으로 -)

  • Ahn, Yoonjung;Lee, Dong-Kun;Kim, Ho Gul;Park, Chan;Kim, Jiyeon;Kim, Jae-uk
    • Journal of the Korean Society of Environmental Restoration Technology
    • /
    • v.18 no.3
    • /
    • pp.51-64
    • /
    • 2015
  • Climate change will make significant impact on species distribution in forest. Pinus koraiensis which is commonly called as Korean Pine is normally distributed in frigid zones. Climate change which causes severe heat could affect distribution of Korean pine. Therefore, this study predicted the distribution of Korean Pine and the suitable habitat area with consideration on uncertainty by applying climate change scenarios on an ensemble model. First of all, a site index was considered when selecting present and absent points and a stratified method was used to select the points. Secondly, environmental and climate variables were chosen by literature review and then confirmed with experts. Those variables were used as input data of BIOMOD2. Thirdly, the present distribution model was made. The result was validated with ROC. Lastly, RCP scenarios were applied on the models to create the future distribution model. As a results, each individual model shows quite big differences in the results but generally most models and ensemble models estimated that the suitable habitat area would be decreased in midterm future(40s) as well as long term future(90s).

Uncertainty Analysis based on LENS-GRM

  • Lee, Sang Hyup;Seong, Yeon Jeong;Park, KiDoo;Jung, Young Hun
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2022.05a
    • /
    • pp.208-208
    • /
    • 2022
  • Recently, the frequency of abnormal weather due to complex factors such as global warming is increasing frequently. From the past rainfall patterns, it is evident that climate change is causing irregular rainfall patterns. This phenomenon causes difficulty in predicting rainfall and makes it difficult to prevent and cope with natural disasters, casuing human and property damages. Therefore, accurate rainfall estimation and rainfall occurrence time prediction could be one of the ways to prevent and mitigate damage caused by flood and drought disasters. However, rainfall prediction has a lot of uncertainty, so it is necessary to understand and reduce this uncertainty. In addition, when accurate rainfall prediction is applied to the rainfall-runoff model, the accuracy of the runoff prediction can be improved. In this regard, this study aims to increase the reliability of rainfall prediction by analyzing the uncertainty of the Korean rainfall ensemble prediction data and the outflow analysis model using the Limited Area ENsemble (LENS) and the Grid based Rainfall-runoff Model (GRM) models. First, the possibility of improving rainfall prediction ability is reviewed using the QM (Quantile Mapping) technique among the bias correction techniques. Then, the GRM parameter calibration was performed twice, and the likelihood-parameter applicability evaluation and uncertainty analysis were performed using R2, NSE, PBIAS, and Log-normal. The rainfall prediction data were applied to the rainfall-runoff model and evaluated before and after calibration. It is expected that more reliable flood prediction will be possible by reducing uncertainty in rainfall ensemble data when applying to the runoff model in selecting behavioral models for user uncertainty analysis. Also, it can be used as a basis of flood prediction research by integrating other parameters such as geological characteristics and rainfall events.

  • PDF

Development of a Deep Learning-based Midterm PM2.5 Prediction Model Adapting to Trend Changes (경향성 변화에 대응하는 딥러닝 기반 초미세먼지 중기 예측 모델 개발)

  • Dong Jun Min;Hyerim Kim;Sangkyun Lee
    • The Transactions of the Korea Information Processing Society
    • /
    • v.13 no.6
    • /
    • pp.251-259
    • /
    • 2024
  • Fine particulate matter, especially PM2.5 with a diameter of less than 2.5 micrometers, poses significant health and economic risks. This study focuses on the Seoul region of South Korea, aiming to analyze PM2.5 data and trends from 2017 to 2022 and develop a mid-term prediction model for PM2.5 concentrations. Utilizing collected and produced air quality and weather data, reanalysis data, and numerical model prediction data, this research proposes an ensemble evaluation method capable of adapting to trend changes. The ensemble method proposed in this study demonstrated superior performance in predicting PM2.5 concentrations, outperforming existing models by an average F1 Score of approximately 42.16% in 2019, 58.92% in 2021, and 34.79% in 2022 for future 3 to 6-day predictions. The model maintains performance under changing environmental conditions, offering stable predictions and presenting a mid-term prediction model that extends beyond the capabilities of existing deep learning-based short-term PM2.5 forecasts.

Improving the Performance of Deep-Learning-Based Ground-Penetrating Radar Cavity Detection Model using Data Augmentation and Ensemble Techniques (데이터 증강 및 앙상블 기법을 이용한 딥러닝 기반 GPR 공동 탐지 모델 성능 향상 연구)

  • Yonguk Choi;Sangjin Seo;Hangilro Jang;Daeung Yoon
    • Geophysics and Geophysical Exploration
    • /
    • v.26 no.4
    • /
    • pp.211-228
    • /
    • 2023
  • Ground-penetrating radar (GPR) surveys are commonly used to monitor embankments, which is a nondestructive geophysical method. The results of GPR surveys can be complex, depending on the situation, and data processing and interpretation are subject to expert experiences, potentially resulting in false detection. Additionally, this process is time-intensive. Consequently, various studies have been undertaken to detect cavities in GPR survey data using deep learning methods. Deep-learning-based approaches require abundant data for training, but GPR field survey data are often scarce due to cost and other factors constaining field studies. Therefore, in this study, a deep- learning-based model was developed for embankment GPR survey cavity detection using data augmentation strategies. A dataset was constructed by collecting survey data over several years from the same embankment. A you look only once (YOLO) model, commonly used in computer vision for object detection, was employed for this purpose. By comparing and analyzing various strategies, the optimal data augmentation approach was determined. After initial model development, a stepwise process was employed, including box clustering, transfer learning, self-ensemble, and model ensemble techniques, to enhance the final model performance. The model performance was evaluated, with the results demonstrating its effectiveness in detecting cavities in embankment GPR survey data.

Ensemble Daily Streamflow Forecast Using Two-step Daily Precipitation Interpolation (일강우 내삽을 이용한 일유량 시뮬레이션 및 앙상블 유량 발생)

  • Hwang, Yeon-Sang;Heo, Jun-Haeng;Jung, Young-Hun
    • Journal of Korea Water Resources Association
    • /
    • v.44 no.3
    • /
    • pp.209-220
    • /
    • 2011
  • Input uncertainty is one of the major sources of uncertainty in hydrologic modeling. In this paper, first, three alternate rainfall inputs generated by different interpolation schemes were used to see the impact on a distributed watershed model. Later, the residuals of precipitation interpolations were tested as a source of ensemble streamflow generation in two river basins in the U.S. Using the Monte Carlo parameter search, the relationship between input and parameter uncertainty was also categorized to see sensitivity of the parameters to input differences. This analysis is useful not only to find the parameters that need more attention but also to transfer parameters calibrated for station measurement to the simulation using different inputs such as downscaled data from weather generator outputs. Input ensembles that preserves local statistical characteristics are used to generate streamflow ensembles hindcast, and showed that the ensemble sets are capturing the observed steamflow properly. This procedure is especially important to consider input uncertainties in the simulation of streamflow forecast.

Logistic Regression Ensemble Method for Extracting Significant Information from Social Texts (소셜 텍스트의 주요 정보 추출을 위한 로지스틱 회귀 앙상블 기법)

  • Kim, So Hyeon;Kim, Han Joon
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.6 no.5
    • /
    • pp.279-284
    • /
    • 2017
  • Currenty, in the era of big data, text mining and opinion mining have been used in many domains, and one of their most important research issues is to extract significant information from social media. Thus in this paper, we propose a logistic regression ensemble method of finding the main body text from blog HTML. First, we extract structural features and text features from blog HTML tags. Then we construct a classification model with logistic regression and ensemble that can decide whether any given tags involve main body text or not. One of our important findings is that the main body text can be found through 'depth' features extracted from HTML tags. In our experiment using diverse topics of blog data collected from the web, our tag classification model achieved 99% in terms of accuracy, and it recalled 80.5% of documents that have tags involving the main body text.

Comparison between Uncertainties of Cultivar Parameter Estimates Obtained Using Error Calculation Methods for Forage Rice Cultivars (오차 계산 방식에 따른 사료용 벼 품종의 품종모수 추정치 불확도 비교)

  • Young Sang Joh;Shinwoo Hyun;Kwang Soo Kim
    • Korean Journal of Agricultural and Forest Meteorology
    • /
    • v.25 no.3
    • /
    • pp.129-141
    • /
    • 2023
  • Crop models have been used to predict yield under diverse environmental and cultivation conditions, which can be used to support decisions on the management of forage crop. Cultivar parameters are one of required inputs to crop models in order to represent genetic properties for a given forage cultivar. The objectives of this study were to compare calibration and ensemble approaches in order to minimize the uncertainty of crop yield estimates using the SIMPLE crop model. Cultivar parameters were calibrated using Log-likelihood (LL) and Generic Composite Similarity Measure (GCSM) as an objective function for Metropolis-Hastings (MH) algorithm. In total, 20 sets of cultivar parameters were generated for each method. Two types of ensemble approach. First type of ensemble approach was the average of model outputs (Eem), using individual parameters. The second ensemble approach was model output (Epm) of cultivar parameter obtained by averaging given 20 sets of parameters. Comparison was done for each cultivar and for each error calculation methods. 'Jowoo' and 'Yeongwoo', which are forage rice cultivars used in Korea, were subject to the parameter calibration. Yield data were obtained from experiment fields at Suwon, Jeonju, Naju and I ksan. Data for 2013, 2014 and 2016 were used for parameter calibration. For validation, yield data reported from 2016 to 2018 at Suwon was used. Initial calibration indicated that genetic coefficients obtained by LL were distributed in a narrower range than coefficients obtained by GCSM. A two-sample t-test was performed to compare between different methods of ensemble approaches and no significant difference was found between them. Uncertainty of GCSM can be neutralized by adjusting the acceptance probability. The other ensemble method (Epm) indicates that the uncertainty can be reduced with less computation using ensemble approach.

A Prediction of Precipitation Over East Asia for June Using Simultaneous and Lagged Teleconnection (원격상관을 이용한 동아시아 6월 강수의 예측)

  • Lee, Kang-Jin;Kwon, MinHo
    • Atmosphere
    • /
    • v.26 no.4
    • /
    • pp.711-716
    • /
    • 2016
  • The dynamical model forecasts using state-of-art general circulation models (GCMs) have some limitations to simulate the real climate system since they do not depend on the past history. One of the alternative methods to correct model errors is to use the canonical correlation analysis (CCA) correction method. CCA forecasts at the present time show better skill than dynamical model forecasts especially over the midlatitudes. Model outputs are adjusted based on the CCA modes between the model forecasts and the observations. This study builds a canonical correlation prediction model for subseasonal (June) precipitation. The predictors are circulation fields over western North Pacific from the Global Seasonal Forecasting System version 5 (GloSea5) and observed snow cover extent over Eurasia continent from Climate Data Record (CDR). The former is based on simultaneous teleconnection between the western North Pacific and the East Asia, and the latter on lagged teleconnection between the Eurasia continent and the East Asia. In addition, we suggest a technique for improving forecast skill by applying the ensemble canonical correlation (ECC) to individual canonical correlation predictions.