• Title/Summary/Keyword: Ensemble Model

Search Result 638, Processing Time 0.027 seconds

The Effect of Input Variables Clustering on the Characteristics of Ensemble Machine Learning Model for Water Quality Prediction (입력자료 군집화에 따른 앙상블 머신러닝 모형의 수질예측 특성 연구)

  • Park, Jungsu
    • Journal of Korean Society on Water Environment
    • /
    • v.37 no.5
    • /
    • pp.335-343
    • /
    • 2021
  • Water quality prediction is essential for the proper management of water supply systems. Increased suspended sediment concentration (SSC) has various effects on water supply systems such as increased treatment cost and consequently, there have been various efforts to develop a model for predicting SSC. However, SSC is affected by both the natural and anthropogenic environment, making it challenging to predict SSC. Recently, advanced machine learning models have increasingly been used for water quality prediction. This study developed an ensemble machine learning model to predict SSC using the XGBoost (XGB) algorithm. The observed discharge (Q) and SSC in two fields monitoring stations were used to develop the model. The input variables were clustered in two groups with low and high ranges of Q using the k-means clustering algorithm. Then each group of data was separately used to optimize XGB (Model 1). The model performance was compared with that of the XGB model using the entire data (Model 2). The models were evaluated by mean squared error-ob servation standard deviation ratio (RSR) and root mean squared error. The RSR were 0.51 and 0.57 in the two monitoring stations for Model 2, respectively, while the model performance improved to RSR 0.46 and 0.55, respectively, for Model 1.

Wood Species Classification Utilizing Ensembles of Convolutional Neural Networks Established by Near-Infrared Spectra and Images Acquired from Korean Softwood Lumber

  • Yang, Sang-Yun;Lee, Hyung Gu;Park, Yonggun;Chung, Hyunwoo;Kim, Hyunbin;Park, Se-Yeong;Choi, In-Gyu;Kwon, Ohkyung;Yeo, Hwanmyeong
    • Journal of the Korean Wood Science and Technology
    • /
    • v.47 no.4
    • /
    • pp.385-392
    • /
    • 2019
  • In our previous study, we investigated the use of ensemble models based on LeNet and MiniVGGNet to classify the images of transverse and longitudinal surfaces of five Korean softwoods (cedar, cypress, Korean pine, Korean red pine, and larch). It had accomplished an average F1 score of more than 98%; the classification performance of the longitudinal surface image was still less than that of the transverse surface image. In this study, ensemble methods of two different convolutional neural network models (LeNet3 for smartphone camera images and NIRNet for NIR spectra) were applied to lumber species classification. Experimentally, the best classification performance was obtained by the averaging ensemble method of LeNet3 and NIRNet. The average F1 scores of the individual LeNet3 model and the individual NIRNet model were 91.98% and 85.94%, respectively. By the averaging ensemble method of LeNet3 and NIRNet, an average F1 score was increased to 95.31%.

Estimation of lightweight aggregate concrete characteristics using a novel stacking ensemble approach

  • Kaloop, Mosbeh R.;Bardhan, Abidhan;Hu, Jong Wan;Abd-Elrahman, Mohamed
    • Advances in nano research
    • /
    • v.13 no.5
    • /
    • pp.499-512
    • /
    • 2022
  • This study investigates the efficiency of ensemble machine learning for predicting the lightweight-aggregate concrete (LWC) characteristics. A stacking ensemble (STEN) approach was proposed to estimate the dry density (DD) and 28 days compressive strength (Fc-28) of LWC using two meta-models called random forest regressor (RFR) and extra tree regressor (ETR), and two novel ensemble models called STEN-RFR and STEN-ETR, were constructed. Four standalone machine learning models including artificial neural network, gradient boosting regression, K neighbor regression, and support vector regression were used to compare the performance of the proposed models. For this purpose, a sum of 140 LWC mixtures with 21 influencing parameters for producing LWC with a density less than 1000 kg/m3, were used. Based on the experimental results with multiple performance criteria, it can be concluded that the proposed STEN-ETR model can be used to estimate the DD and Fc-28 of LWC. Moreover, the STEN-ETR approach was found to be a significant technique in prediction DD and Fc-28 of LWC with minimal prediction error. In the validation phase, the accuracy of the proposed STEN-ETR model in predicting DD and Fc-28 was found to be 96.79% and 81.50%, respectively. In addition, the significance of cement, water-cement ratio, silica fume, and aggregate with expanded glass variables is efficient in modeling DD and Fc-28 of LWC.

Accuracy Assessment of Land-Use Land-Cover Classification Using Semantic Segmentation-Based Deep Learning Model and RapidEye Imagery (RapidEye 위성영상과 Semantic Segmentation 기반 딥러닝 모델을 이용한 토지피복분류의 정확도 평가)

  • Woodam Sim;Jong Su Yim;Jung-Soo Lee
    • Korean Journal of Remote Sensing
    • /
    • v.39 no.3
    • /
    • pp.269-282
    • /
    • 2023
  • The purpose of this study was to construct land cover maps using a deep learning model and to select the optimal deep learning model for land cover classification by adjusting the dataset such as input image size and Stride application. Two types of deep learning models, the U-net model and the DeeplabV3+ model with an Encoder-Decoder network, were utilized. Also, the combination of the two deep learning models, which is an Ensemble model, was used in this study. The dataset utilized RapidEye satellite images as input images and the label images used Raster images based on the six categories of the land use of Intergovernmental Panel on Climate Change as true value. This study focused on the problem of the quality improvement of the dataset to enhance the accuracy of deep learning model and constructed twelve land cover maps using the combination of three deep learning models (U-net, DeeplabV3+, and Ensemble), two input image sizes (64 × 64 pixel and 256 × 256 pixel), and two Stride application rates (50% and 100%). The evaluation of the accuracy of the label images and the deep learning-based land cover maps showed that the U-net and DeeplabV3+ models had high accuracy, with overall accuracy values of approximately 87.9% and 89.8%, and kappa coefficients of over 72%. In addition, applying the Ensemble and Stride to the deep learning models resulted in a maximum increase of approximately 3% in accuracy and an improvement in the issue of boundary inconsistency, which is a problem associated with Semantic Segmentation based deep learning models.

Uncertainty assessment of ensemble streamflow prediction method (앙상블 유량예측기법의 불확실성 평가)

  • Kim, Seon-Ho;Kang, Shin-Uk;Bae, Deg-Hyo
    • Journal of Korea Water Resources Association
    • /
    • v.51 no.6
    • /
    • pp.523-533
    • /
    • 2018
  • The objective of this study is to analyze uncertainties of ensemble-based streamflow prediction method for model parameters and input data. ESP (Ensemble Streamflow Prediction) and BAYES-ESP (Bayesian-ESP) based on ABCD rainfall-runoff model were selected as streamflow prediction method. GLUE (Generalized Likelihood Uncertainty Estimation) was applied for the analysis of parameter uncertainty. The analysis of input uncertainty was performed according to the duration of meteorological scenarios for ESP. The result showed that parameter uncertainty was much more significant than input uncertainty for the ensemble-based streamflow prediction. It also indicated that the duration of observed meteorological data was appropriate to using more than 20 years. And the BAYES-ESP was effective to reduce uncertainty of ESP method. It is concluded that this analysis is meaningful for elaborating characteristics of ESP method and error factors of ensemble-based streamflow prediction method.

A Study on Predicting Lung Cancer Using RNA-Sequencing Data with Ensemble Learning (앙상블 기법을 활용한 RNA-Sequencing 데이터의 폐암 예측 연구)

  • Geon AN;JooYong PARK
    • Journal of Korea Artificial Intelligence Association
    • /
    • v.2 no.1
    • /
    • pp.7-14
    • /
    • 2024
  • In this paper, we explore the application of RNA-sequencing data and ensemble machine learning to predict lung cancer and treatment strategies for lung cancer, a leading cause of cancer mortality worldwide. The research utilizes Random Forest, XGBoost, and LightGBM models to analyze gene expression profiles from extensive datasets, aiming to enhance predictive accuracy for lung cancer prognosis. The methodology focuses on preprocessing RNA-seq data to standardize expression levels across samples and applying ensemble algorithms to maximize prediction stability and reduce model overfitting. Key findings indicate that ensemble models, especially XGBoost, substantially outperform traditional predictive models. Significant genetic markers such as ADGRF5 is identified as crucial for predicting lung cancer outcomes. In conclusion, ensemble learning using RNA-seq data proves highly effective in predicting lung cancer, suggesting a potential shift towards more precise and personalized treatment approaches. The results advocate for further integration of molecular and clinical data to refine diagnostic models and improve clinical outcomes, underscoring the critical role of advanced molecular diagnostics in enhancing patient survival rates and quality of life. This study lays the groundwork for future research in the application of RNA-sequencing data and ensemble machine learning techniques in clinical settings.

Stochastic Simple Hydrologic Partitioning Model Associated with Markov Chain Monte Carlo and Ensemble Kalman Filter (마코프 체인 몬테카를로 및 앙상블 칼만필터와 연계된 추계학적 단순 수문분할모형)

  • Choi, Jeonghyeon;Lee, Okjeong;Won, Jeongeun;Kim, Sangdan
    • Journal of Korean Society on Water Environment
    • /
    • v.36 no.5
    • /
    • pp.353-363
    • /
    • 2020
  • Hydrologic models can be classified into two types: those for understanding physical processes and those for predicting hydrologic quantities. This study deals with how to use the model to predict today's stream flow based on the system's knowledge of yesterday's state and the model parameters. In this regard, for the model to generate accurate predictions, the uncertainty of the parameters and appropriate estimates of the state variables are required. In this study, a relatively simple hydrologic partitioning model is proposed that can explicitly implement the hydrologic partitioning process, and the posterior distribution of the parameters of the proposed model is estimated using the Markov chain Monte Carlo approach. Further, the application method of the ensemble Kalman filter is proposed for updating the normalized soil moisture, which is the state variable of the model, by linking the information on the posterior distribution of the parameters and by assimilating the observed steam flow data. The stochastically and recursively estimated stream flows using the data assimilation technique revealed better representation of the observed data than the stream flows predicted using the deterministic model. Therefore, the ensemble Kalman filter in conjunction with the Markov chain Monte Carlo approach could be a reliable and effective method for forecasting daily stream flow, and it could also be a suitable method for routinely updating and monitoring the watershed-averaged soil moisture.

Development of 12-month Ensemble Prediction System Using PNU CGCM V1.1 (PNU CGCM V1.1을 이용한 12개월 앙상블 예측 시스템의 개발)

  • Ahn, Joong-Bae;Lee, Su-Bong;Ryoo, Sang-Boom
    • Atmosphere
    • /
    • v.22 no.4
    • /
    • pp.455-464
    • /
    • 2012
  • This study investigates a 12 month-lead predictability of PNU Coupled General Circulation Model (CGCM) V1.1 hindcast, for which an oceanic data assimilated initialization is used to generate ocean initial condition. The CGCM, a participant model of APEC Climate Center (APCC) long-lead multi-model ensemble system, has been initialized at each and every month and performed 12-month-lead hindcast for each month during 1980 to 2011. The 12-month-lead hindcast consisted of 2-5 ensembles and this study verified the ensemble averaged hindcast. As for the sea-surface temperature concerns, it remained high level of confidence especially over the tropical Pacific and the mid-latitude central Pacific with slight declining of temporal correlation coefficients (TCC) as lead month increased. The CGCM revealed trustworthy ENSO prediction skills in most of hindcasts, in particular. For atmospheric variables, like air temperature, precipitation, and geopotential height at 500hPa, reliable prediction results have been shown during entire lead time in most of domain, particularly over the equatorial region. Though the TCCs of hindcasted precipitation are lower than other variables, a skillful precipitation forecasts is also shown over highly variable regions such as ITCZ. This study also revealed that there are seasonal and regional dependencies on predictability for each variable and lead.

Prediction Skill of Intraseasonal Monthly Temperature and Precipitation Variations for APCC Multi-Models (APCC 다중 모형 자료 기반 계절 내 월 기온 및 강수 변동 예측성)

  • Song, Chan-Yeong;Ahn, Joong-Bae
    • Atmosphere
    • /
    • v.30 no.4
    • /
    • pp.405-420
    • /
    • 2020
  • In this study, we investigate the predictability of intraseasonal monthly temperature and precipitation variations using hindcast datasets from eight global circulation models participating in the operational multi-model ensemble (MME) seasonal prediction system of the Asia-Pacific Economic Cooperation Climate Center for the 1983~2010 period. These intraseasonal monthly variations are defined by categorical deterministic analysis. The monthly temperature and precipitation are categorized into above normal (AN), near normal (NN), and below normal (BN) based on the σ-value ± 0.43 after standardization. The nine patterns of intraseasonal monthly variation are defined by considering the changing pattern of the monthly categories for the three consecutive months. A deterministic and a probabilistic analysis are used to define intraseasonal monthly variation for the multi-model consisting of numerous ensemble members. The results show that a pattern (pattern 7), which has the same monthly categories in three consecutive months, is the most frequently occurring pattern in observation regardless of the seasons and variables. Meanwhile, the patterns (e.g., patterns 8 and 9) that have consistently increasing or decreasing trends in three consecutive months, such as BN-NN-AN or AN-NN-BN, occur rarely in observation. The MME and eight individual models generally capture pattern 7 well but rarely capture patterns 8 and 9.

Ensemble approach for improving prediction in kernel regression and classification

  • Han, Sunwoo;Hwang, Seongyun;Lee, Seokho
    • Communications for Statistical Applications and Methods
    • /
    • v.23 no.4
    • /
    • pp.355-362
    • /
    • 2016
  • Ensemble methods often help increase prediction ability in various predictive models by combining multiple weak learners and reducing the variability of the final predictive model. In this work, we demonstrate that ensemble methods also enhance the accuracy of prediction under kernel ridge regression and kernel logistic regression classification. Here we apply bagging and random forests to two kernel-based predictive models; and present the procedure of how bagging and random forests can be embedded in kernel-based predictive models. Our proposals are tested under numerous synthetic and real datasets; subsequently, they are compared with plain kernel-based predictive models and their subsampling approach. Numerical studies demonstrate that ensemble approach outperforms plain kernel-based predictive models.