• Title/Summary/Keyword: ensemble prediction

Search Result 365, Processing Time 0.027 seconds

Optimal Selection of Classifier Ensemble Using Genetic Algorithms (유전자 알고리즘을 이용한 분류자 앙상블의 최적 선택)

  • Kim, Myung-Jong
    • Journal of Intelligence and Information Systems
    • /
    • v.16 no.4
    • /
    • pp.99-112
    • /
    • 2010
  • Ensemble learning is a method for improving the performance of classification and prediction algorithms. It is a method for finding a highly accurateclassifier on the training set by constructing and combining an ensemble of weak classifiers, each of which needs only to be moderately accurate on the training set. Ensemble learning has received considerable attention from machine learning and artificial intelligence fields because of its remarkable performance improvement and flexible integration with the traditional learning algorithms such as decision tree (DT), neural networks (NN), and SVM, etc. In those researches, all of DT ensemble studies have demonstrated impressive improvements in the generalization behavior of DT, while NN and SVM ensemble studies have not shown remarkable performance as shown in DT ensembles. Recently, several works have reported that the performance of ensemble can be degraded where multiple classifiers of an ensemble are highly correlated with, and thereby result in multicollinearity problem, which leads to performance degradation of the ensemble. They have also proposed the differentiated learning strategies to cope with performance degradation problem. Hansen and Salamon (1990) insisted that it is necessary and sufficient for the performance enhancement of an ensemble that the ensemble should contain diverse classifiers. Breiman (1996) explored that ensemble learning can increase the performance of unstable learning algorithms, but does not show remarkable performance improvement on stable learning algorithms. Unstable learning algorithms such as decision tree learners are sensitive to the change of the training data, and thus small changes in the training data can yield large changes in the generated classifiers. Therefore, ensemble with unstable learning algorithms can guarantee some diversity among the classifiers. To the contrary, stable learning algorithms such as NN and SVM generate similar classifiers in spite of small changes of the training data, and thus the correlation among the resulting classifiers is very high. This high correlation results in multicollinearity problem, which leads to performance degradation of the ensemble. Kim,s work (2009) showedthe performance comparison in bankruptcy prediction on Korea firms using tradition prediction algorithms such as NN, DT, and SVM. It reports that stable learning algorithms such as NN and SVM have higher predictability than the unstable DT. Meanwhile, with respect to their ensemble learning, DT ensemble shows the more improved performance than NN and SVM ensemble. Further analysis with variance inflation factor (VIF) analysis empirically proves that performance degradation of ensemble is due to multicollinearity problem. It also proposes that optimization of ensemble is needed to cope with such a problem. This paper proposes a hybrid system for coverage optimization of NN ensemble (CO-NN) in order to improve the performance of NN ensemble. Coverage optimization is a technique of choosing a sub-ensemble from an original ensemble to guarantee the diversity of classifiers in coverage optimization process. CO-NN uses GA which has been widely used for various optimization problems to deal with the coverage optimization problem. The GA chromosomes for the coverage optimization are encoded into binary strings, each bit of which indicates individual classifier. The fitness function is defined as maximization of error reduction and a constraint of variance inflation factor (VIF), which is one of the generally used methods to measure multicollinearity, is added to insure the diversity of classifiers by removing high correlation among the classifiers. We use Microsoft Excel and the GAs software package called Evolver. Experiments on company failure prediction have shown that CO-NN is effectively applied in the stable performance enhancement of NNensembles through the choice of classifiers by considering the correlations of the ensemble. The classifiers which have the potential multicollinearity problem are removed by the coverage optimization process of CO-NN and thereby CO-NN has shown higher performance than a single NN classifier and NN ensemble at 1% significance level, and DT ensemble at 5% significance level. However, there remain further research issues. First, decision optimization process to find optimal combination function should be considered in further research. Secondly, various learning strategies to deal with data noise should be introduced in more advanced further researches in the future.

Estimation of lightweight aggregate concrete characteristics using a novel stacking ensemble approach

  • Kaloop, Mosbeh R.;Bardhan, Abidhan;Hu, Jong Wan;Abd-Elrahman, Mohamed
    • Advances in nano research
    • /
    • v.13 no.5
    • /
    • pp.499-512
    • /
    • 2022
  • This study investigates the efficiency of ensemble machine learning for predicting the lightweight-aggregate concrete (LWC) characteristics. A stacking ensemble (STEN) approach was proposed to estimate the dry density (DD) and 28 days compressive strength (Fc-28) of LWC using two meta-models called random forest regressor (RFR) and extra tree regressor (ETR), and two novel ensemble models called STEN-RFR and STEN-ETR, were constructed. Four standalone machine learning models including artificial neural network, gradient boosting regression, K neighbor regression, and support vector regression were used to compare the performance of the proposed models. For this purpose, a sum of 140 LWC mixtures with 21 influencing parameters for producing LWC with a density less than 1000 kg/m3, were used. Based on the experimental results with multiple performance criteria, it can be concluded that the proposed STEN-ETR model can be used to estimate the DD and Fc-28 of LWC. Moreover, the STEN-ETR approach was found to be a significant technique in prediction DD and Fc-28 of LWC with minimal prediction error. In the validation phase, the accuracy of the proposed STEN-ETR model in predicting DD and Fc-28 was found to be 96.79% and 81.50%, respectively. In addition, the significance of cement, water-cement ratio, silica fume, and aggregate with expanded glass variables is efficient in modeling DD and Fc-28 of LWC.

Appraisal of spatial characteristics and applicability of the predicted ensemble rainfall data (강우앙상블 예측자료의 공간적 특성 및 적용성 평가)

  • Lee, Sang-Hyeop;Seong, Yeon-Jeong;Kim, Gyeong-Tak;Jeong, Yeong-Hun
    • Journal of Korea Water Resources Association
    • /
    • v.53 no.11
    • /
    • pp.1025-1037
    • /
    • 2020
  • This study attempted to evaluate the spatial characteristics and applicability of the predicted ensemble rainfall data used for heavy rain alarms. Limited area ENsemble prediction System (LENS) has 13 rainfall ensemble members, so it is possible to use a probabilistic method in issuing heavy rain warnings. However, the accessibility of LENS data is very low, so studies on the applicability of rainfall prediction data are insufficient. In this study, the evaluation index was calculated by comparing one point value and the area average value with the observed value according to the heavy rain warning system used for each administrative district. In addition, the accuracy of each ensemble member according to the LENS issuance time was evaluated. LENS showed the uncertainty of over or under prediction by member. Area-based prediction showed higher predictability than point-based prediction. In addition, the LENS data that predicts the upcoming 72-hour rainfall showed good predictive performance for rainfall events that may have an impact on a water disaster. In the future, the predicted rainfall data from LENS are expected to be used as basic data to prepare for floods in administrative districts or watersheds.

Development of 12-month Ensemble Prediction System Using PNU CGCM V1.1 (PNU CGCM V1.1을 이용한 12개월 앙상블 예측 시스템의 개발)

  • Ahn, Joong-Bae;Lee, Su-Bong;Ryoo, Sang-Boom
    • Atmosphere
    • /
    • v.22 no.4
    • /
    • pp.455-464
    • /
    • 2012
  • This study investigates a 12 month-lead predictability of PNU Coupled General Circulation Model (CGCM) V1.1 hindcast, for which an oceanic data assimilated initialization is used to generate ocean initial condition. The CGCM, a participant model of APEC Climate Center (APCC) long-lead multi-model ensemble system, has been initialized at each and every month and performed 12-month-lead hindcast for each month during 1980 to 2011. The 12-month-lead hindcast consisted of 2-5 ensembles and this study verified the ensemble averaged hindcast. As for the sea-surface temperature concerns, it remained high level of confidence especially over the tropical Pacific and the mid-latitude central Pacific with slight declining of temporal correlation coefficients (TCC) as lead month increased. The CGCM revealed trustworthy ENSO prediction skills in most of hindcasts, in particular. For atmospheric variables, like air temperature, precipitation, and geopotential height at 500hPa, reliable prediction results have been shown during entire lead time in most of domain, particularly over the equatorial region. Though the TCCs of hindcasted precipitation are lower than other variables, a skillful precipitation forecasts is also shown over highly variable regions such as ITCZ. This study also revealed that there are seasonal and regional dependencies on predictability for each variable and lead.

Development of Deep Learning Ensemble Modeling for Cryptocurrency Price Prediction : Deep 4-LSTM Ensemble Model (암호화폐 가격 예측을 위한 딥러닝 앙상블 모델링 : Deep 4-LSTM Ensemble Model)

  • Choi, Soo-bin;Shin, Dong-hoon;Yoon, Sang-Hyeak;Kim, Hee-Woong
    • Journal of Information Technology Services
    • /
    • v.19 no.6
    • /
    • pp.131-144
    • /
    • 2020
  • As the blockchain technology attracts attention, interest in cryptocurrency that is received as a reward is also increasing. Currently, investments and transactions are continuing with the expectation and increasing value of cryptocurrency. Accordingly, prediction for cryptocurrency price has been attempted through artificial intelligence technology and social sentiment analysis. The purpose of this paper is to develop a deep learning ensemble model for predicting the price fluctuations and one-day lag price of cryptocurrency based on the design science research method. This paper intends to perform predictive modeling on Ethereum among cryptocurrencies to make predictions more efficiently and accurately than existing models. Therefore, it collects data for five years related to Ethereum price and performs pre-processing through customized functions. In the model development stage, four LSTM models, which are efficient for time series data processing, are utilized to build an ensemble model with the optimal combination of hyperparameters found in the experimental process. Then, based on the performance evaluation scale, the superiority of the model is evaluated through comparison with other deep learning models. The results of this paper have a practical contribution that can be used as a model that shows high performance and predictive rate for cryptocurrency price prediction and price fluctuations. Besides, it shows academic contribution in that it improves the quality of research by following scientific design research procedures that solve scientific problems and create and evaluate new and innovative products in the field of information systems.

Performance Assessment of Weekly Ensemble Prediction Data at Seasonal Forecast System with High Resolution (고해상도 장기예측시스템의 주별 앙상블 예측자료 성능 평가)

  • Ham, Hyunjun;Won, Dukjin;Lee, Yei-sook
    • Atmosphere
    • /
    • v.27 no.3
    • /
    • pp.261-276
    • /
    • 2017
  • The main objectives of this study are to introduce Global Seasonal forecasting system version5 (GloSea5) of KMA and to evaluate the performance of ensemble prediction of system. KMA has performed an operational seasonal forecast system which is a joint system between KMA and UK Met office since 2014. GloSea5 is a fully coupled global climate model which consists of atmosphere (UM), ocean (NEMO), land surface (JULES) and sea ice (CICE) components through the coupler OASIS. The model resolution, used in GloSea5, is N216L85 (~60 km in mid-latitudes) in the atmosphere and ORCA0.25L75 ($0.25^{\circ}$ on a tri-polar grid) in the ocean. In this research, we evaluate the performance of this system using by RMSE, Correlation and MSSS for ensemble mean values. The forecast (FCST) and hindcast (HCST) are separately verified, and the operational data of GloSea5 are used from 2014 to 2015. The performance skills are similar to the past study. For example, the RMSE of h500 is increased from 22.30 gpm of 1 week forecast to 53.82 gpm of 7 week forecast but there is a similar error about 50~53 gpm after 3 week forecast. The Nino Index of SST shows a great correlation (higher than 0.9) up to 7 week forecast in Nino 3.4 area. It can be concluded that GloSea5 has a great performance for seasonal prediction.

Uncertainty investigation and mitigation in flood forecasting

  • Nguyen, Hoang-Minh;Bae, Deg-Hyo
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2018.05a
    • /
    • pp.155-155
    • /
    • 2018
  • Uncertainty in flood forecasting using a coupled meteorological and hydrological model is arisen from various sources, especially the uncertainty comes from the inaccuracy of Quantitative Precipitation Forecasts (QPFs). In order to improve the capability of flood forecast, the uncertainty estimation and mitigation are required to perform. This study is conducted to investigate and reduce such uncertainty. First, ensemble QPFs are generated by using Monte - Carlo simulation, then each ensemble member is forced as input for a hydrological model to obtain ensemble streamflow prediction. Likelihood measures are evaluated to identify feasible member. These members are retained to define upper and lower limits of the uncertainty interval and assess the uncertainty. To mitigate the uncertainty for very short lead time, a blending method, which merges the ensemble QPFs with radar-based rainfall prediction considering both qualitative and quantitative skills, is proposed. Finally, blending bias ratios, which are estimated from previous time step, are used to update the members over total lead time. The proposed method is verified for the two flood events in 2013 and 2016 in the Yeonguol and Soyang watersheds that are located in the Han River basin, South Korea. The uncertainty in flood forecasting using a coupled Local Data Assimilation and Prediction System (LDAPS) and Sejong University Rainfall - Runoff (SURR) model is investigated and then mitigated by blending the generated ensemble LDAPS members with radar-based rainfall prediction that uses McGill algorithm for precipitation nowcasting by Lagrangian extrapolation (MAPLE). The results show that the uncertainty of flood forecasting using the coupled model increases when the lead time is longer. The mitigation method indicates its effectiveness for mitigating the uncertainty with the increases of the percentage of feasible member (POFM) and the ratio of the number of observations that fall into the uncertainty interval (p-factor).

  • PDF

Path Loss Prediction Using an Ensemble Learning Approach

  • Beom Kwon;Eonsu Noh
    • Journal of the Korea Society of Computer and Information
    • /
    • v.29 no.2
    • /
    • pp.1-12
    • /
    • 2024
  • Predicting path loss is one of the important factors for wireless network design, such as selecting the installation location of base stations in cellular networks. In the past, path loss values were measured through numerous field tests to determine the optimal installation location of the base station, which has the disadvantage of taking a lot of time to measure. To solve this problem, in this study, we propose a path loss prediction method based on machine learning (ML). In particular, an ensemble learning approach is applied to improve the path loss prediction performance. Bootstrap dataset was utilized to obtain models with different hyperparameter configurations, and the final model was built by ensembling these models. We evaluated and compared the performance of the proposed ensemble-based path loss prediction method with various ML-based methods using publicly available path loss datasets. The experimental results show that the proposed method outperforms the existing methods and can predict the path loss values accurately.

Effects of Resolution, Cumulus Parameterization Scheme, and Probability Forecasting on Precipitation Forecasts in a High-Resolution Limited-Area Ensemble Prediction System

  • On, Nuri;Kim, Hyun Mee;Kim, SeHyun
    • Asia-Pacific Journal of Atmospheric Sciences
    • /
    • v.54 no.4
    • /
    • pp.623-637
    • /
    • 2018
  • This study investigates the effects of horizontal resolution, cumulus parameterization scheme (CPS), and probability forecasting on precipitation forecasts over the Korean Peninsula from 00 UTC 15 August to 12 UTC 14 September 2013, using the limited-area ensemble prediction system (LEPS) of the Korea Meteorological Administration. To investigate the effect of resolution, the control members of the LEPS with 1.5- and 3-km resolution were compared. Two 3-km experiments with and without the CPS were conducted for the control member, because a 3-km resolution lies within the gray zone. For probability forecasting, 12 ensemble members with 3-km resolution were run using the LEPS. The forecast performance was evaluated for both the whole study period and precipitation cases categorized by synoptic forcing. The performance of precipitation forecasts using the 1.5-km resolution was better than that using the 3-km resolution for both the total period and individual cases. The result of the 3-km resolution experiment with the CPS did not differ significantly from that without it. The 3-km ensemble mean and probability matching (PM) performed better than the 3-km control member, regardless of the use of the CPS. The PM complemented the defect of the ensemble mean, which better predicts precipitation regions but underestimates precipitation amount by averaging ensembles, compared to the control member. Further, both the 3-km ensemble mean and PM outperformed the 1.5-km control member, which implies that the lower performance of the 3-km control member compared to the 1.5-km control member was complemented by probability forecasting.

The Development of Ensemble Statistical Prediction Model for Changma Precipitation (장마 강수를 위한 앙상블 통계 예측 모델 개발)

  • Kim, Jin-Yong;Seo, Kyong-Hwan
    • Atmosphere
    • /
    • v.24 no.4
    • /
    • pp.533-540
    • /
    • 2014
  • Statistical forecast models for the prediction of the summertime Changma precipitation have been developed in this study. As effective predictors for the Changma precipitation, the springtime sea surface temperature (SST) anomalies over the North Atlantic (NA1), the North Pacific (NPC) and the tropical Pacific Ocean (CNINO) has been suggested in Lee and Seo (2013). To further improve the performance of the statistical prediction scheme, we select other potential predictors and construct 2 additional statistical models. The selected predictors are the Northern Indian Ocean (NIO) and the Bering Sea (BS) SST anomalies, and the spring Eurasian snow cover anomaly (EUSC). Then, using the total three statistical prediction models, a simple ensemble-mean prediction is performed. The resulting correlation skill score reaches as high as ~0.90 for the last 21 years, which is ~16% increase in the skill compared to the prediction model by Lee and Seo (2013). The EUSC and BS predictors are related to a strengthening of the Okhotsk high, leading to an enhancement of the Changma front. The NIO predictor induces the cyclonic anomalies to the southwest of the Korean peninsula and southeasterly flows toward the peninsula, giving rise to an increase in the Changma precipitation.