• Title/Summary/Keyword: Ensemble models

Search Result 365, Processing Time 0.029 seconds

A Study on the Application of Modeling to predict the Distribution of Legally Protected Species Under Climate Change - A Case Study of Rodgersia podophylla - (기후변화에 따른 법정보호종 분포 예측을 위한 종분포모델 적용 방법 검토 - Rodgersia podophylla를 중심으로 -)

  • Yoo, Youngjae;Hwang, Jinhoo;Jeon, Seong-woo
    • Journal of the Korean Society of Environmental Restoration Technology
    • /
    • v.27 no.3
    • /
    • pp.29-43
    • /
    • 2024
  • Legally protected species are one of the crucial considerations in the field of natural ecology when conducting environmental impact assessments (EIAs). The occurrence of legally protected species, especially 'Endangered Wildlife' designated by Ministry of Environment, significantly influences the progression of projects subject to EIA, necessitating clear investigations and presentations of their habitats. In perspective of statistics, a minimum of 30 occurrence coordinates is required for population prediction, but most of endangered wildlife has insufficient coordinates and it posing challenges for distribution prediction through modeling. Consequently, this study aims to propose modeling methodologies applicable when coordinate data are limited, focusing on Rodgersia podophylla, representing characteristics of endangered wildlife and northern plant species. For this methodology, 30 random sampling coordinates were used as input data, assuming little survey data, and modeling was performed using individual models included in BIOMOD2. After that, the modeling results were evaluated by using discrimination capacity and the reality reflection ability. An optimal modeling technique was proposed by ensemble the remaining models except for the MaxEnt model, which was found to be less reliable in the modeling results. Alongside discussions on discrimination capacity metrics(e.g. TSS and AUC) presented in modeling results, this study provides insights and suggestions for improvement, but it has limitations that it is difficult to use universally because it is not a study conducted on various species. By supporting survey site selection in EIA processes, this research is anticipated to contribute to minimizing situations where protected species are overlooked in survey results.

Product Recommender Systems using Multi-Model Ensemble Techniques (다중모형조합기법을 이용한 상품추천시스템)

  • Lee, Yeonjeong;Kim, Kyoung-Jae
    • Journal of Intelligence and Information Systems
    • /
    • v.19 no.2
    • /
    • pp.39-54
    • /
    • 2013
  • Recent explosive increase of electronic commerce provides many advantageous purchase opportunities to customers. In this situation, customers who do not have enough knowledge about their purchases, may accept product recommendations. Product recommender systems automatically reflect user's preference and provide recommendation list to the users. Thus, product recommender system in online shopping store has been known as one of the most popular tools for one-to-one marketing. However, recommender systems which do not properly reflect user's preference cause user's disappointment and waste of time. In this study, we propose a novel recommender system which uses data mining and multi-model ensemble techniques to enhance the recommendation performance through reflecting the precise user's preference. The research data is collected from the real-world online shopping store, which deals products from famous art galleries and museums in Korea. The data initially contain 5759 transaction data, but finally remain 3167 transaction data after deletion of null data. In this study, we transform the categorical variables into dummy variables and exclude outlier data. The proposed model consists of two steps. The first step predicts customers who have high likelihood to purchase products in the online shopping store. In this step, we first use logistic regression, decision trees, and artificial neural networks to predict customers who have high likelihood to purchase products in each product group. We perform above data mining techniques using SAS E-Miner software. In this study, we partition datasets into two sets as modeling and validation sets for the logistic regression and decision trees. We also partition datasets into three sets as training, test, and validation sets for the artificial neural network model. The validation dataset is equal for the all experiments. Then we composite the results of each predictor using the multi-model ensemble techniques such as bagging and bumping. Bagging is the abbreviation of "Bootstrap Aggregation" and it composite outputs from several machine learning techniques for raising the performance and stability of prediction or classification. This technique is special form of the averaging method. Bumping is the abbreviation of "Bootstrap Umbrella of Model Parameter," and it only considers the model which has the lowest error value. The results show that bumping outperforms bagging and the other predictors except for "Poster" product group. For the "Poster" product group, artificial neural network model performs better than the other models. In the second step, we use the market basket analysis to extract association rules for co-purchased products. We can extract thirty one association rules according to values of Lift, Support, and Confidence measure. We set the minimum transaction frequency to support associations as 5%, maximum number of items in an association as 4, and minimum confidence for rule generation as 10%. This study also excludes the extracted association rules below 1 of lift value. We finally get fifteen association rules by excluding duplicate rules. Among the fifteen association rules, eleven rules contain association between products in "Office Supplies" product group, one rules include the association between "Office Supplies" and "Fashion" product groups, and other three rules contain association between "Office Supplies" and "Home Decoration" product groups. Finally, the proposed product recommender systems provides list of recommendations to the proper customers. We test the usability of the proposed system by using prototype and real-world transaction and profile data. For this end, we construct the prototype system by using the ASP, Java Script and Microsoft Access. In addition, we survey about user satisfaction for the recommended product list from the proposed system and the randomly selected product lists. The participants for the survey are 173 persons who use MSN Messenger, Daum Caf$\acute{e}$, and P2P services. We evaluate the user satisfaction using five-scale Likert measure. This study also performs "Paired Sample T-test" for the results of the survey. The results show that the proposed model outperforms the random selection model with 1% statistical significance level. It means that the users satisfied the recommended product list significantly. The results also show that the proposed system may be useful in real-world online shopping store.

Preliminary Result of Uncertainty on Variation of Flowering Date of Kiwifruit: Case Study of Kiwifruit Growing Area of Jeonlanam-do (기후변화에 따른 국내 키위 품종 '해금'의 개화시기 변동과 전망에 대한 불확실성: 전남 키위 주산지역을 중심으로)

  • Kim, Kwang-Hyung;Jeong, Yeo Min;Cho, Youn-Sup;Chung, Uran
    • Korean Journal of Agricultural and Forest Meteorology
    • /
    • v.18 no.1
    • /
    • pp.42-54
    • /
    • 2016
  • It is highly anticipated that warming temperature resulting from global climate change will affect the phenological pattern of kiwifruit, which has been commercially grown in Korea since the early 1980s. Here, we present the potential impacts of climate change on the variations of flowering day of a gold kiwifruit cultivar, Haegeum, in the Jeonnam Province, Korea. By running six global climate models (GCM), the results from this study emphasize the uncertainty in climate change scenarios. To predict the flowering day of kiwifruit, we obtained three parameters of the 'Chill-day' model for the simulation of Haegeum: $6.3^{\circ}C$ for the base temperature (Tb), 102.5 for chill requirement (Rc), and 575 for heat requirement (Rh). Two separate validations of the resulting 'Chill-day' model were conducted. First, direct comparisons were made between the observed flowering days collected from 25 kiwifruit orchards for two years (2014-15) and the simulated flowering days from the 'Chill-day' model using weather data from four weather stations near the 25 orchards. The estimation error between the observed and simulated flowering days was 5.2 days. Second, the model was simulated using temperature data extracted, for the 25 orchards, from a high-resolution digital temperature map, resulting in the error of 3.4 days. Using the RCP 4.5 and 8.5 climate change scenarios from six GCMs for the period of 2021-40, the future flowering days were simulated with the 'Chill-day' model. The predicted flowering days of Haegeum in Jeonnam were advanced more than 10 days compared to the present ones from multi-model ensemble, while some individual models resulted in quite different magnitudes of impacts, indicating the multi-model ensemble accounts for uncertainty better than individual climate models. In addition, the current flowering period of Haegeum in Jeonnam Province was predicted to expand northward, reaching over Jeonbuk and Chungnam Provinces. This preliminary result will provide a basis for the local impact assessment of climate change as more phenology models are developed for other fruit trees.

Corporate Bankruptcy Prediction Model using Explainable AI-based Feature Selection (설명가능 AI 기반의 변수선정을 이용한 기업부실예측모형)

  • Gundoo Moon;Kyoung-jae Kim
    • Journal of Intelligence and Information Systems
    • /
    • v.29 no.2
    • /
    • pp.241-265
    • /
    • 2023
  • A corporate insolvency prediction model serves as a vital tool for objectively monitoring the financial condition of companies. It enables timely warnings, facilitates responsive actions, and supports the formulation of effective management strategies to mitigate bankruptcy risks and enhance performance. Investors and financial institutions utilize default prediction models to minimize financial losses. As the interest in utilizing artificial intelligence (AI) technology for corporate insolvency prediction grows, extensive research has been conducted in this domain. However, there is an increasing demand for explainable AI models in corporate insolvency prediction, emphasizing interpretability and reliability. The SHAP (SHapley Additive exPlanations) technique has gained significant popularity and has demonstrated strong performance in various applications. Nonetheless, it has limitations such as computational cost, processing time, and scalability concerns based on the number of variables. This study introduces a novel approach to variable selection that reduces the number of variables by averaging SHAP values from bootstrapped data subsets instead of using the entire dataset. This technique aims to improve computational efficiency while maintaining excellent predictive performance. To obtain classification results, we aim to train random forest, XGBoost, and C5.0 models using carefully selected variables with high interpretability. The classification accuracy of the ensemble model, generated through soft voting as the goal of high-performance model design, is compared with the individual models. The study leverages data from 1,698 Korean light industrial companies and employs bootstrapping to create distinct data groups. Logistic Regression is employed to calculate SHAP values for each data group, and their averages are computed to derive the final SHAP values. The proposed model enhances interpretability and aims to achieve superior predictive performance.

Predicting change of suitable plantation of Schisandra chinensis with ensemble of climate change scenario (기후변화 시나리오 앙상블을 통한 오미자의 재배적지 변화 예측)

  • Lee, Sol Ae;Lee, Sang-Hyuk;Ji, Seung-Yong;Choi, Jaeyong
    • Journal of Environmental Impact Assessment
    • /
    • v.25 no.1
    • /
    • pp.77-87
    • /
    • 2016
  • Predicting possible distributed area of Schisandra chinensis which has long term cultivation period among non-timber forest products is needed to be studied to deal with climate change. Hence, distribution of Schisandra chinensis in the 2050s and 2070s was predicted under two scenario, RCP 4.5 and RCP 8.5, with ensemble of 5 climate models used in IPCC AR5. According to estimation using RCP 4.5, distribution of Schisandra chinensis in 2050s appeared to decrease 43% of current area and appeared to decrease 57% in 2070s respectively. Moreover, According to estimation using RCP 8.5, distribution of Schisandra chinensis in 2050s appeared to decrease 55% of current area and appeared to decrease 85% in 2070s. As a final outcome, Schisandra chinensis was estimated to extinct in the future except Gangwon-do and Gyeongsangbuk-do when analyzing change between current distributed area and future distributed area. As a result, those areas were classified as vulnerable areas to climate change. Therefore, Gangwon-do and Gyeongsangbuk-do were thought to be ideal for growing Schisandra chinensis. The result from this study can be used to provide basic information for selecting proper area of Schisandra chinensis considering climate change effect.

Development of daily spatio-temporal downscaling model with conditional Copula based bias-correction of GloSea5 monthly ensemble forecasts (조건부 Copula 함수 기반의 월단위 GloSea5 앙상블 예측정보 편의보정 기법과 연계한 일단위 시공간적 상세화 모델 개발)

  • Kim, Yong-Tak;Kim, Min Ji;Kwon, Hyun-Han
    • Journal of Korea Water Resources Association
    • /
    • v.54 no.12
    • /
    • pp.1317-1328
    • /
    • 2021
  • This study aims to provide a predictive model based on climate models for simulating continuous daily rainfall sequences by combining bias-correction and spatio-temporal downscaling approaches. For these purposes, this study proposes a combined modeling system by applying conditional Copula and Multisite Non-stationary Hidden Markov Model (MNHMM). The GloSea5 system releases the monthly rainfall prediction on the same day every week, however, there are noticeable differences in the updated prediction. It was confirmed that the monthly rainfall forecasts are effectively updated with the use of the Copula-based bias-correction approach. More specifically, the proposed bias-correction approach was validated for the period from 1991 to 2010 under the LOOCV scheme. Several rainfall statistics, such as rainfall amounts, consecutive rainfall frequency, consecutive zero rainfall frequency, and wet days, are well reproduced, which is expected to be highly effective as input data of the hydrological model. The difference in spatial coherence between the observed and simulated rainfall sequences over the entire weather stations was estimated in the range of -0.02~0.10, and the interdependence between rainfall stations in the watershed was effectively reproduced. Therefore, it is expected that the hydrological response of the watershed will be more realistically simulated when used as input data for the hydrological model.

Study on Predicting the Designation of Administrative Issue in the KOSDAQ Market Based on Machine Learning Based on Financial Data (머신러닝 기반 KOSDAQ 시장의 관리종목 지정 예측 연구: 재무적 데이터를 중심으로)

  • Yoon, Yanghyun;Kim, Taekyung;Kim, Suyeong
    • Asia-Pacific Journal of Business Venturing and Entrepreneurship
    • /
    • v.17 no.1
    • /
    • pp.229-249
    • /
    • 2022
  • This paper investigates machine learning models for predicting the designation of administrative issues in the KOSDAQ market through various techniques. When a company in the Korean stock market is designated as administrative issue, the market recognizes the event itself as negative information, causing losses to the company and investors. The purpose of this study is to evaluate alternative methods for developing a artificial intelligence service to examine a possibility to the designation of administrative issues early through the financial ratio of companies and to help investors manage portfolio risks. In this study, the independent variables used 21 financial ratios representing profitability, stability, activity, and growth. From 2011 to 2020, when K-IFRS was applied, financial data of companies in administrative issues and non-administrative issues stocks are sampled. Logistic regression analysis, decision tree, support vector machine, random forest, and LightGBM are used to predict the designation of administrative issues. According to the results of analysis, LightGBM with 82.73% classification accuracy is the best prediction model, and the prediction model with the lowest classification accuracy is a decision tree with 71.94% accuracy. As a result of checking the top three variables of the importance of variables in the decision tree-based learning model, the financial variables common in each model are ROE(Net profit) and Capital stock turnover ratio, which are relatively important variables in designating administrative issues. In general, it is confirmed that the learning model using the ensemble had higher predictive performance than the single learning model.

Uncertainty of Hydro-meteorological Predictions Due to Climate Change in the Republic of Korea (기후변화에 따른 우리나라 수문 기상학적 예측의 불확실성)

  • Nkomozepi, Temba;Chung, Sang-Ok
    • Journal of Korea Water Resources Association
    • /
    • v.47 no.3
    • /
    • pp.257-267
    • /
    • 2014
  • The impact of the combination of changes in temperature and rainfall due to climate change on surface water resources is important in hydro-meteorological research. In this study, 4 hydro-meteorological (HM) models from the Rainfall Runoff Library in the Catchment Modeling Toolkit were used to model the impact of climate change on runoff in streams for 5 river basins in the Republic of Korea. Future projections from 2021 to 2040 (2030s), 2051 to 2070 (2060s) and 2081 to 2099 (2090s), were derived from 12 General Circulation Models (GCMs) and 3 representative concentration pathways (RCPs). GCM outputs were statistically adjusted and downscaled using Long-Ashton Research Station Weather Generator (LARS-WG) and the HM models were well calibrated and verified for the period from 1999 to 2009. The study showed that there is substantial spatial, temporal and HM uncertainty in the future runoff shown by the interquartile range, range and coefficient of variation. In summary, the aggregated runoff will increase in the future by 10~24%, 7~30% and 11~30% of the respective baseline runoff for the RCP2.6, RCP4.5 and RCP8.5, respectively. This study presents a method to model future stream-flow taking into account the HM model and climate based uncertainty.

A Study on the Effect of Cumulus Parameterization and Microphysics on Ozone Simulations during Long-range Transport Process over Northeast Asia (동북아 장거리 수송 과정에서 적운 모수화 및 미세물리과정이 오존 모사농도에 미치는 영향 연구)

  • Kang, Jeong-Eon;Kim, Cheol-Hee
    • Journal of Korean Society for Atmospheric Environment
    • /
    • v.29 no.2
    • /
    • pp.135-151
    • /
    • 2013
  • This study has been carried out to analyze the sensitivity of ozone concentrations by employing different options of cumulus parameterization schemes (CPSs) and microphysics schemes in MM5 models. These sensitivity tests were applied to long-range transport case of higher ozone over Northeast Asia. Employed CPS schemes are Betts-Miller (BM), Grell (GR), Kain-Fritsch2 (KF2), Anthes-Kuo (AK), None scheme (grid scale physics only), and four microphysics used here are Simple ice, Reisner1, Reisner2, Schultz scheme in MM5. We chose two cases of high ozone long range transport case by employing both concentrations ozone level and backward trajectory model. The results showed that modeled ozone concentrations indicated about 10% differences among CPSs. Of the all options, GR and KF2 (for CPS), and Rersiner-1 and Resiner-2 (for microphysics) showed relatively good and stable variations against ensemble mean values. For both CPS and microphysics schemes, the difference of precipitation arising from different parameterization schemes was significant by itself, but the resultant ozone variations showed only marginal. But the cloud fraction differences arising from different parameterization schemes showed better correlation with ozone variations than precipitation differences, indicating that the photochemical ozone generation variations is more dominant by cloud fraction than wet removal process for high and long-ranged transported ozone cases over Northeast Asia.

Changes in the Low Latitude Atmospheric Circulation at the End of the 21st Century Simulated by CMIP5 Models under Global Warming (CMIP5 모델에서 모의되는 지구온난화에 따른 21세기 말 저위도 대기 순환의 변화)

  • Jung, Yoo-Rim;Choi, Da-Hee;Baek, Hee-Jeong;Cho, Chunho
    • Atmosphere
    • /
    • v.23 no.4
    • /
    • pp.377-387
    • /
    • 2013
  • Projections of changes in the low latitude atmospheric circulation under global warming are investigated using the results of the CMIP5 ensemble mean. For this purpose, 30-yr periods for the present day (1971~2000) and the end of the $21^{st}$ century (2071~2100) according to the RCP emission scenarios are compared. The wintertime subtropical jet is projected to strengthen on the upper side of the jet due to increase in meridional temperature gradient induced by warming in the tropical upper-troposphere and cooling in the stratosphere except for the RCP2.6. It is also found that a strengthening of the upper side of the wintertime subtropical jet in the RCP2.6 due to tropical upper-tropospheric warmings. Model-based projection shows a weakening of the mean intensity of the Hadley cell, an upward shift of cell, and poleward shift of the Hadley circulation for the winter cell in both hemispheres. A weakening of the Walker circulation, which is one of the most robust atmospheric responses to global warming, is also projected. These results are consistent with findings in the previous studies based on CMIP3 data sets. A weakening of the Walker circulation is accompanied with decrease (increase) in precipitation over the Indo-Pacific warm pool region (the equatorial central and east Pacific). In addition, model simulation shows a decrease in precipitation over subtropical regions where the descending branch of the winter Hadley cell in both hemispheres is strengthened.