• Title/Summary/Keyword: Weighted ensemble

Search Result 35, Processing Time 0.017 seconds

Prediction of Potential Species Richness of Plants Adaptable to Climate Change in the Korean Peninsula (한반도 기후변화 적응 대상 식물 종풍부도 변화 예측 연구)

  • Shin, Man-Seok;Seo, Changwan;Lee, Myungwoo;Kim, Jin-Yong;Jeon, Ja-Young;Adhikari, Pradeep;Hong, Seung-Bum
    • Journal of Environmental Impact Assessment
    • /
    • v.27 no.6
    • /
    • pp.562-581
    • /
    • 2018
  • This study was designed to predict the changes in species richness of plants under the climate change in South Korea. The target species were selected based on the Plants Adaptable to Climate Change in the Korean Peninsula. Altogether, 89 species including 23 native plants, 30 northern plants, and 36 southern plants. We used the Species Distribution Model to predict the potential habitat of individual species under the climate change. We applied ten single-model algorithms and the pre-evaluation weighted ensemble method. And then, species richness was derived from the results of individual species. Two representative concentration pathways (RCP 4.5 and RCP 8.5) were used to simulate the species richness of plants in 2050 and 2070. The current species richness was predicted to be high in the national parks located in the Baekdudaegan mountain range in Gangwon Province and islands of the South Sea. The future species richness was predicted to be lower in the national park and the Baekdudaegan mountain range in Gangwon Province and to be higher for southern coastal regions. The average value of the current species richness showed that the national park area was higher than the whole area of South Korea. However, predicted species richness were not the difference between the national park area and the whole area of South Korea. The difference between current and future species richness of plants could be the disappearance of a large number of native and northern plants from South Korea. The additional reason could be the expansion of potential habitat of southern plants under climate change. However, if species dispersal to a suitable habitat was not achieved, the species richness will be reduced drastically. The results were different depending on whether species were dispersed or not. This study will be useful for the conservation planning, establishment of the protected area, restoration of biological species and strategies for adaptation of climate change.

Research on ITB Contract Terms Classification Model for Risk Management in EPC Projects: Deep Learning-Based PLM Ensemble Techniques (EPC 프로젝트의 위험 관리를 위한 ITB 문서 조항 분류 모델 연구: 딥러닝 기반 PLM 앙상블 기법 활용)

  • Hyunsang Lee;Wonseok Lee;Bogeun Jo;Heejun Lee;Sangjin Oh;Sangwoo You;Maru Nam;Hyunsik Lee
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.12 no.11
    • /
    • pp.471-480
    • /
    • 2023
  • The Korean construction order volume in South Korea grew significantly from 91.3 trillion won in public orders in 2013 to a total of 212 trillion won in 2021, particularly in the private sector. As the size of the domestic and overseas markets grew, the scale and complexity of EPC (Engineering, Procurement, Construction) projects increased, and risk management of project management and ITB (Invitation to Bid) documents became a critical issue. The time granted to actual construction companies in the bidding process following the EPC project award is not only limited, but also extremely challenging to review all the risk terms in the ITB document due to manpower and cost issues. Previous research attempted to categorize the risk terms in EPC contract documents and detect them based on AI, but there were limitations to practical use due to problems related to data, such as the limit of labeled data utilization and class imbalance. Therefore, this study aims to develop an AI model that can categorize the contract terms based on the FIDIC Yellow 2017(Federation Internationale Des Ingenieurs-Conseils Contract terms) standard in detail, rather than defining and classifying risk terms like previous research. A multi-text classification function is necessary because the contract terms that need to be reviewed in detail may vary depending on the scale and type of the project. To enhance the performance of the multi-text classification model, we developed the ELECTRA PLM (Pre-trained Language Model) capable of efficiently learning the context of text data from the pre-training stage, and conducted a four-step experiment to validate the performance of the model. As a result, the ensemble version of the self-developed ITB-ELECTRA model and Legal-BERT achieved the best performance with a weighted average F1-Score of 76% in the classification of 57 contract terms.

Doubly-robust Q-estimation in observational studies with high-dimensional covariates (고차원 관측자료에서의 Q-학습 모형에 대한 이중강건성 연구)

  • Lee, Hyobeen;Kim, Yeji;Cho, Hyungjun;Choi, Sangbum
    • The Korean Journal of Applied Statistics
    • /
    • v.34 no.3
    • /
    • pp.309-327
    • /
    • 2021
  • Dynamic treatment regimes (DTRs) are decision-making rules designed to provide personalized treatment to individuals in multi-stage randomized trials. Unlike classical methods, in which all individuals are prescribed the same type of treatment, DTRs prescribe patient-tailored treatments which take into account individual characteristics that may change over time. The Q-learning method, one of regression-based algorithms to figure out optimal treatment rules, becomes more popular as it can be easily implemented. However, the performance of the Q-learning algorithm heavily relies on the correct specification of the Q-function for response, especially in observational studies. In this article, we examine a number of double-robust weighted least-squares estimating methods for Q-learning in high-dimensional settings, where treatment models for propensity score and penalization for sparse estimation are also investigated. We further consider flexible ensemble machine learning methods for the treatment model to achieve double-robustness, so that optimal decision rule can be correctly estimated as long as at least one of the outcome model or treatment model is correct. Extensive simulation studies show that the proposed methods work well with practical sample sizes. The practical utility of the proposed methods is proven with real data example.

Ordinary Kriging of Daily Mean SST (Sea Surface Temperature) around South Korea and the Analysis of Interpolation Accuracy (정규크리깅을 이용한 우리나라 주변해역 일평균 해수면온도 격자지도화 및 내삽정확도 분석)

  • Ahn, Jihye;Lee, Yangwon
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.40 no.1
    • /
    • pp.51-66
    • /
    • 2022
  • SST (Sea Surface Temperature) is based on the atmosphere-ocean interaction, one of the most important mechanisms for the Earth system. Because it is a crucial oceanic and meteorological factor for understanding climate change, gap-free grid data at a specific spatial and temporal resolution is beneficial in SST studies. This paper examined the production of daily SST grid maps from 137 stations in 2020 through the ordinary kriging with variogram optimization and their accuracy assessment. The variogram optimization was achieved by WLS (Weighted Least Squares) method, and the blind tests for the interpolation accuracy assessment were conducted by an objective and spatially unbiased sampling scheme. The four-round blind tests showed a pretty high accuracy: a root mean square error between 0.995 and 1.035℃ and a correlation coefficient between 0.981 and 0.982. In terms of season, the accuracy in summer was a bit lower, presumably because of the abrupt change in SST affected by the typhoon. The accuracy was better in the far seas than in the near seas. West Sea showed better accuracy than East or South Sea. It is because the semi-enclosed sea in the near seas can have different physical characteristics. The seasonal and regional factors should be considered for accuracy improvement in future work, and the improved SST can be a member of the SST ensemble around South Korea.

A Recidivism Prediction Model Based on XGBoost Considering Asymmetric Error Costs (비대칭 오류 비용을 고려한 XGBoost 기반 재범 예측 모델)

  • Won, Ha-Ram;Shim, Jae-Seung;Ahn, Hyunchul
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.1
    • /
    • pp.127-137
    • /
    • 2019
  • Recidivism prediction has been a subject of constant research by experts since the early 1970s. But it has become more important as committed crimes by recidivist steadily increase. Especially, in the 1990s, after the US and Canada adopted the 'Recidivism Risk Assessment Report' as a decisive criterion during trial and parole screening, research on recidivism prediction became more active. And in the same period, empirical studies on 'Recidivism Factors' were started even at Korea. Even though most recidivism prediction studies have so far focused on factors of recidivism or the accuracy of recidivism prediction, it is important to minimize the prediction misclassification cost, because recidivism prediction has an asymmetric error cost structure. In general, the cost of misrecognizing people who do not cause recidivism to cause recidivism is lower than the cost of incorrectly classifying people who would cause recidivism. Because the former increases only the additional monitoring costs, while the latter increases the amount of social, and economic costs. Therefore, in this paper, we propose an XGBoost(eXtream Gradient Boosting; XGB) based recidivism prediction model considering asymmetric error cost. In the first step of the model, XGB, being recognized as high performance ensemble method in the field of data mining, was applied. And the results of XGB were compared with various prediction models such as LOGIT(logistic regression analysis), DT(decision trees), ANN(artificial neural networks), and SVM(support vector machines). In the next step, the threshold is optimized to minimize the total misclassification cost, which is the weighted average of FNE(False Negative Error) and FPE(False Positive Error). To verify the usefulness of the model, the model was applied to a real recidivism prediction dataset. As a result, it was confirmed that the XGB model not only showed better prediction accuracy than other prediction models but also reduced the cost of misclassification most effectively.