• 제목/요약/키워드: Model Ensemble

검색결과 638건 처리시간 0.027초

입력자료 군집화에 따른 앙상블 머신러닝 모형의 수질예측 특성 연구 (The Effect of Input Variables Clustering on the Characteristics of Ensemble Machine Learning Model for Water Quality Prediction)

  • 박정수
    • 한국물환경학회지
    • /
    • 제37권5호
    • /
    • pp.335-343
    • /
    • 2021
  • Water quality prediction is essential for the proper management of water supply systems. Increased suspended sediment concentration (SSC) has various effects on water supply systems such as increased treatment cost and consequently, there have been various efforts to develop a model for predicting SSC. However, SSC is affected by both the natural and anthropogenic environment, making it challenging to predict SSC. Recently, advanced machine learning models have increasingly been used for water quality prediction. This study developed an ensemble machine learning model to predict SSC using the XGBoost (XGB) algorithm. The observed discharge (Q) and SSC in two fields monitoring stations were used to develop the model. The input variables were clustered in two groups with low and high ranges of Q using the k-means clustering algorithm. Then each group of data was separately used to optimize XGB (Model 1). The model performance was compared with that of the XGB model using the entire data (Model 2). The models were evaluated by mean squared error-ob servation standard deviation ratio (RSR) and root mean squared error. The RSR were 0.51 and 0.57 in the two monitoring stations for Model 2, respectively, while the model performance improved to RSR 0.46 and 0.55, respectively, for Model 1.

Wood Species Classification Utilizing Ensembles of Convolutional Neural Networks Established by Near-Infrared Spectra and Images Acquired from Korean Softwood Lumber

  • Yang, Sang-Yun;Lee, Hyung Gu;Park, Yonggun;Chung, Hyunwoo;Kim, Hyunbin;Park, Se-Yeong;Choi, In-Gyu;Kwon, Ohkyung;Yeo, Hwanmyeong
    • Journal of the Korean Wood Science and Technology
    • /
    • 제47권4호
    • /
    • pp.385-392
    • /
    • 2019
  • In our previous study, we investigated the use of ensemble models based on LeNet and MiniVGGNet to classify the images of transverse and longitudinal surfaces of five Korean softwoods (cedar, cypress, Korean pine, Korean red pine, and larch). It had accomplished an average F1 score of more than 98%; the classification performance of the longitudinal surface image was still less than that of the transverse surface image. In this study, ensemble methods of two different convolutional neural network models (LeNet3 for smartphone camera images and NIRNet for NIR spectra) were applied to lumber species classification. Experimentally, the best classification performance was obtained by the averaging ensemble method of LeNet3 and NIRNet. The average F1 scores of the individual LeNet3 model and the individual NIRNet model were 91.98% and 85.94%, respectively. By the averaging ensemble method of LeNet3 and NIRNet, an average F1 score was increased to 95.31%.

Estimation of lightweight aggregate concrete characteristics using a novel stacking ensemble approach

  • Kaloop, Mosbeh R.;Bardhan, Abidhan;Hu, Jong Wan;Abd-Elrahman, Mohamed
    • Advances in nano research
    • /
    • 제13권5호
    • /
    • pp.499-512
    • /
    • 2022
  • This study investigates the efficiency of ensemble machine learning for predicting the lightweight-aggregate concrete (LWC) characteristics. A stacking ensemble (STEN) approach was proposed to estimate the dry density (DD) and 28 days compressive strength (Fc-28) of LWC using two meta-models called random forest regressor (RFR) and extra tree regressor (ETR), and two novel ensemble models called STEN-RFR and STEN-ETR, were constructed. Four standalone machine learning models including artificial neural network, gradient boosting regression, K neighbor regression, and support vector regression were used to compare the performance of the proposed models. For this purpose, a sum of 140 LWC mixtures with 21 influencing parameters for producing LWC with a density less than 1000 kg/m3, were used. Based on the experimental results with multiple performance criteria, it can be concluded that the proposed STEN-ETR model can be used to estimate the DD and Fc-28 of LWC. Moreover, the STEN-ETR approach was found to be a significant technique in prediction DD and Fc-28 of LWC with minimal prediction error. In the validation phase, the accuracy of the proposed STEN-ETR model in predicting DD and Fc-28 was found to be 96.79% and 81.50%, respectively. In addition, the significance of cement, water-cement ratio, silica fume, and aggregate with expanded glass variables is efficient in modeling DD and Fc-28 of LWC.

RapidEye 위성영상과 Semantic Segmentation 기반 딥러닝 모델을 이용한 토지피복분류의 정확도 평가 (Accuracy Assessment of Land-Use Land-Cover Classification Using Semantic Segmentation-Based Deep Learning Model and RapidEye Imagery)

  • 심우담;임종수;이정수
    • 대한원격탐사학회지
    • /
    • 제39권3호
    • /
    • pp.269-282
    • /
    • 2023
  • 본 연구는 딥러닝 모델(deep learning model)을 활용하여 토지피복분류를 수행하였으며 입력 이미지의 크기, Stride 적용 등 데이터세트(dataset)의 조절을 통해 토지피복분류를 위한 최적의 딥러닝 모델 선정을 목적으로 하였다. 적용한 딥러닝 모델은 3종류로 Encoder-Decoder 구조를 가진 U-net과 DeeplabV3+, 두 가지 모델을 결합한 앙상블(Ensemble) 모델을 활용하였다. 데이터세트는 RapidEye 위성영상을 입력영상으로, 라벨(label) 이미지는 Intergovernmental Panel on Climate Change 토지이용의 6가지 범주에 따라 구축한 Raster 이미지를 참값으로 활용하였다. 딥러닝 모델의 정확도 향상을 위해 데이터세트의 질적 향상 문제에 대해 주목하였으며 딥러닝 모델(U-net, DeeplabV3+, Ensemble), 입력 이미지 크기(64 × 64 pixel, 256 × 256 pixel), Stride 적용(50%, 100%) 조합을 통해 12가지 토지피복도를 구축하였다. 라벨 이미지와 딥러닝 모델 기반의 토지피복도의 정합성 평가결과, U-net과 DeeplabV3+ 모델의 전체 정확도는 각각 최대 약 87.9%와 89.8%, kappa 계수는 모두 약 72% 이상으로 높은 정확도를 보였으며, 64 × 64 pixel 크기의 데이터세트를 활용한 U-net 모델의 정확도가 가장 높았다. 또한 딥러닝 모델에 앙상블 및 Stride를 적용한 결과, 최대 약 3% 정확도가 상승하였으며 Semantic Segmentation 기반 딥러닝 모델의 단점인 경계간의 불일치가 개선됨을 확인하였다.

앙상블 유량예측기법의 불확실성 평가 (Uncertainty assessment of ensemble streamflow prediction method)

  • 김선호;강신욱;배덕효
    • 한국수자원학회논문집
    • /
    • 제51권6호
    • /
    • pp.523-533
    • /
    • 2018
  • 본 연구에서는 충주댐 유역에 대해 앙상블 유량예측기법의 강우-유출 모델 매개변수, 입력자료에 따른 불확실성 분석을 수행하였다. 앙상블 유량예측기법으로는 ESP (Ensemble Streamflow Prediction) 기법과 BAYES-ESP (Bayesian-ESP) 기법을 활용하였으며, 강우-유출 모델로는 ABCD를 활용하였다. 모델 매개변수에 따른 불확실성 분석은 GLUE (Generalized Likelihood Uncertainty Estimation) 기법을 적용하였으며, 입력자료에 따른 불확실성 분석은 유량예측 앙상블에 활용되는 기상시나리오의 기간에 따라 수행하였다. 연구결과 앙상블 유량예측 기법은 입력자료 보다 모델 매개변수의 영향을 크게 받았으며, 20년 이상의 관측 기상자료가 확보되었을 때 활용하는 것이 적절하였다. 또한 BAYES-ESP는 ESP에 비해 불확실성을 감소시킬 수 있는 것으로 나타났다. 본 연구는 불확실성 분석을 통해 앙상블 유량예측기법의 특징을 규명하고 오차의 원인을 분석하였다는 점에서 가치가 있다고 판단된다.

앙상블 기법을 활용한 RNA-Sequencing 데이터의 폐암 예측 연구 (A Study on Predicting Lung Cancer Using RNA-Sequencing Data with Ensemble Learning)

  • Geon AN;JooYong PARK
    • Journal of Korea Artificial Intelligence Association
    • /
    • 제2권1호
    • /
    • pp.7-14
    • /
    • 2024
  • In this paper, we explore the application of RNA-sequencing data and ensemble machine learning to predict lung cancer and treatment strategies for lung cancer, a leading cause of cancer mortality worldwide. The research utilizes Random Forest, XGBoost, and LightGBM models to analyze gene expression profiles from extensive datasets, aiming to enhance predictive accuracy for lung cancer prognosis. The methodology focuses on preprocessing RNA-seq data to standardize expression levels across samples and applying ensemble algorithms to maximize prediction stability and reduce model overfitting. Key findings indicate that ensemble models, especially XGBoost, substantially outperform traditional predictive models. Significant genetic markers such as ADGRF5 is identified as crucial for predicting lung cancer outcomes. In conclusion, ensemble learning using RNA-seq data proves highly effective in predicting lung cancer, suggesting a potential shift towards more precise and personalized treatment approaches. The results advocate for further integration of molecular and clinical data to refine diagnostic models and improve clinical outcomes, underscoring the critical role of advanced molecular diagnostics in enhancing patient survival rates and quality of life. This study lays the groundwork for future research in the application of RNA-sequencing data and ensemble machine learning techniques in clinical settings.

마코프 체인 몬테카를로 및 앙상블 칼만필터와 연계된 추계학적 단순 수문분할모형 (Stochastic Simple Hydrologic Partitioning Model Associated with Markov Chain Monte Carlo and Ensemble Kalman Filter)

  • 최정현;이옥정;원정은;김상단
    • 한국물환경학회지
    • /
    • 제36권5호
    • /
    • pp.353-363
    • /
    • 2020
  • Hydrologic models can be classified into two types: those for understanding physical processes and those for predicting hydrologic quantities. This study deals with how to use the model to predict today's stream flow based on the system's knowledge of yesterday's state and the model parameters. In this regard, for the model to generate accurate predictions, the uncertainty of the parameters and appropriate estimates of the state variables are required. In this study, a relatively simple hydrologic partitioning model is proposed that can explicitly implement the hydrologic partitioning process, and the posterior distribution of the parameters of the proposed model is estimated using the Markov chain Monte Carlo approach. Further, the application method of the ensemble Kalman filter is proposed for updating the normalized soil moisture, which is the state variable of the model, by linking the information on the posterior distribution of the parameters and by assimilating the observed steam flow data. The stochastically and recursively estimated stream flows using the data assimilation technique revealed better representation of the observed data than the stream flows predicted using the deterministic model. Therefore, the ensemble Kalman filter in conjunction with the Markov chain Monte Carlo approach could be a reliable and effective method for forecasting daily stream flow, and it could also be a suitable method for routinely updating and monitoring the watershed-averaged soil moisture.

PNU CGCM V1.1을 이용한 12개월 앙상블 예측 시스템의 개발 (Development of 12-month Ensemble Prediction System Using PNU CGCM V1.1)

  • 안중배;이수봉;류상범
    • 대기
    • /
    • 제22권4호
    • /
    • pp.455-464
    • /
    • 2012
  • This study investigates a 12 month-lead predictability of PNU Coupled General Circulation Model (CGCM) V1.1 hindcast, for which an oceanic data assimilated initialization is used to generate ocean initial condition. The CGCM, a participant model of APEC Climate Center (APCC) long-lead multi-model ensemble system, has been initialized at each and every month and performed 12-month-lead hindcast for each month during 1980 to 2011. The 12-month-lead hindcast consisted of 2-5 ensembles and this study verified the ensemble averaged hindcast. As for the sea-surface temperature concerns, it remained high level of confidence especially over the tropical Pacific and the mid-latitude central Pacific with slight declining of temporal correlation coefficients (TCC) as lead month increased. The CGCM revealed trustworthy ENSO prediction skills in most of hindcasts, in particular. For atmospheric variables, like air temperature, precipitation, and geopotential height at 500hPa, reliable prediction results have been shown during entire lead time in most of domain, particularly over the equatorial region. Though the TCCs of hindcasted precipitation are lower than other variables, a skillful precipitation forecasts is also shown over highly variable regions such as ITCZ. This study also revealed that there are seasonal and regional dependencies on predictability for each variable and lead.

APCC 다중 모형 자료 기반 계절 내 월 기온 및 강수 변동 예측성 (Prediction Skill of Intraseasonal Monthly Temperature and Precipitation Variations for APCC Multi-Models)

  • 송찬영;안중배
    • 대기
    • /
    • 제30권4호
    • /
    • pp.405-420
    • /
    • 2020
  • In this study, we investigate the predictability of intraseasonal monthly temperature and precipitation variations using hindcast datasets from eight global circulation models participating in the operational multi-model ensemble (MME) seasonal prediction system of the Asia-Pacific Economic Cooperation Climate Center for the 1983~2010 period. These intraseasonal monthly variations are defined by categorical deterministic analysis. The monthly temperature and precipitation are categorized into above normal (AN), near normal (NN), and below normal (BN) based on the σ-value ± 0.43 after standardization. The nine patterns of intraseasonal monthly variation are defined by considering the changing pattern of the monthly categories for the three consecutive months. A deterministic and a probabilistic analysis are used to define intraseasonal monthly variation for the multi-model consisting of numerous ensemble members. The results show that a pattern (pattern 7), which has the same monthly categories in three consecutive months, is the most frequently occurring pattern in observation regardless of the seasons and variables. Meanwhile, the patterns (e.g., patterns 8 and 9) that have consistently increasing or decreasing trends in three consecutive months, such as BN-NN-AN or AN-NN-BN, occur rarely in observation. The MME and eight individual models generally capture pattern 7 well but rarely capture patterns 8 and 9.

Ensemble approach for improving prediction in kernel regression and classification

  • Han, Sunwoo;Hwang, Seongyun;Lee, Seokho
    • Communications for Statistical Applications and Methods
    • /
    • 제23권4호
    • /
    • pp.355-362
    • /
    • 2016
  • Ensemble methods often help increase prediction ability in various predictive models by combining multiple weak learners and reducing the variability of the final predictive model. In this work, we demonstrate that ensemble methods also enhance the accuracy of prediction under kernel ridge regression and kernel logistic regression classification. Here we apply bagging and random forests to two kernel-based predictive models; and present the procedure of how bagging and random forests can be embedded in kernel-based predictive models. Our proposals are tested under numerous synthetic and real datasets; subsequently, they are compared with plain kernel-based predictive models and their subsampling approach. Numerical studies demonstrate that ensemble approach outperforms plain kernel-based predictive models.