• Title/Summary/Keyword: Ensemble models

Search Result 358, Processing Time 0.029 seconds

Comparing climate projections for Asia, East Asia and South Korea (아시아 대륙, 동아시아, 대한민국을 대상으로 다른 공간적 규모의 기후변화시나리오 예측 비교)

  • Choe, Hyeyeong;Thorne, James H.;Lee, Dongkun
    • Journal of Environmental Impact Assessment
    • /
    • v.26 no.2
    • /
    • pp.114-126
    • /
    • 2017
  • Many studies on climate change and its impacts use a single climate scenario. However, one climate scenario may not accurately predict the potential impacts of climate change. We estimated temperature and precipitation changes by 2070 using 17 of the CMIP5 Global Climate Models (GCMs) and two emission scenarios for three spatial domains: the Asian continent, six East Asia countries, and South Korea. For South Korea, the range of increased minimum temperature was lower than for the ranges of the larger regions, but the range of projected future precipitation was higher. The range of increased minimum temperatures was between $1.3^{\circ}C$ and $5.2^{\circ}C$, and the change in precipitation ranged from - 42.4 mm (- 3.2%) and + 389.8 mm (+ 29.6%) for South Korea. The range of increased minimum temperatures was between $2.3^{\circ}C$ and $8.5^{\circ}C$ for East Asia countries and was between $2.1^{\circ}C$ and $7.4^{\circ}C$ for the Asian continent, and the change in precipitation ranged from 28.8 mm (+ 6.3%) and 156.8 mm (+ 34.3%) for East Asia countries and from 32.4 mm (+ 5.5%) and 126.2 mm (+ 21.3%) for the Asian continent. We suggest climate change studies in South Korea should not use a single GCM or only an ensemble climate model's output and we recommend to use GFDL-CM3 and INMCM4 GCMs to bracket projected change for use in other national climate change studies to represent the range of projected future climate conditions.

A Prediction Model for the Development of Cataract Using Random Forests (Random Forests 기법을 이용한 백내장 예측모형 - 일개 대학병원 건강검진 수검자료에서 -)

  • Han, Eun-Jeong;Song, Ki-Jun;Kim, Dong-Geon
    • The Korean Journal of Applied Statistics
    • /
    • v.22 no.4
    • /
    • pp.771-780
    • /
    • 2009
  • Cataract is the main cause of blindness and visual impairment, especially, age-related cataract accounts for about half of the 32 million cases of blindness worldwide. As the life expectancy and the expansion of the elderly population are increasing, the cases of cataract increase as well, which causes a serious economic and social problem throughout the country. However, the incidence of cataract can be reduced dramatically through early diagnosis and prevention. In this study, we developed a prediction model of cataracts for early diagnosis using hospital data of 3,237 subjects who received the screening test first and then later visited medical center for cataract check-ups cataract between 1994 and 2005. To develop the prediction model, we used random forests and compared the predictive performance of this model with other common discriminant models such as logistic regression, discriminant model, decision tree, naive Bayes, and two popular ensemble model, bagging and arcing. The accuracy of random forests was 67.16%, sensitivity was 72.28%, and main factors included in this model were age, diabetes, WBC, platelet, triglyceride, BMI and so on. The results showed that it could predict about 70% of cataract existence by screening test without any information from direct eye examination by ophthalmologist. We expect that our model may contribute to diagnose cataract and help preventing cataract in early stages.

Use of Climate Information for Improving Extended Streamflow Prediction in Korea (중장기 유량예측 향상을 위한 국내 기후정보의 이용)

  • Lee Jae-Kyoung;Kim Young-Oh;Jeong Dae-Il
    • Journal of Korea Water Resources Association
    • /
    • v.39 no.9 s.170
    • /
    • pp.755-766
    • /
    • 2006
  • Since the accuracy of climate forecast information has improved from better understanding of the climatic system, particularly, from the better understanding of ENSO and the improvement in meteorological models, the forecasted climate information is becoming the important clue for streamflow prediction. This study investigated the available climate forecast information to improve the extended streamflow prediction in Korea, such as MIMI(Monthly Industrial Meteorological Information) and GDAPS(Global Data Assimilation and Prediction) and measured their accuracies. Both MIMI and the 10-day forecast of GDAPS were superior to a naive forecasts and peformed better for the flood season than for the dry season, thus it was proved that such climate forecasts would be valuable for the flood season. This study then forecasted the monthly inflows to Chungju Dam by using MIMI and GDAPS. For MIMI, we compared three cases: All, Intersection, Union. The accuracies of all three cases are better than the naive forecast and especially, Extended Streamflow Predictions(ESPs) with the Intersection and with Union scenarios were superior to that with the All scenarios for the flood season. For GDAPS, the 10-day ahead streamflow prediction also has the better accuracy for the flood season than for the dry season. Therefore, this study proved that using the climate information such as MIMI and GDAPS to reduce the meteorologic uncertainty can improve the accuracy of the extended streamflow prediction for the flood season.

A Recommending System for Care Plan(Res-CP) in Long-Term Care Insurance System (데이터마이닝 기법을 활용한 노인장기요양급여 권고모형 개발)

  • Han, Eun-Jeong;Lee, Jung-Suk;Kim, Dong-Geon;Ka, Im-Ok
    • The Korean Journal of Applied Statistics
    • /
    • v.22 no.6
    • /
    • pp.1229-1237
    • /
    • 2009
  • In the long-term care insurance(LTCI) system, the question of how to provide the most appropriate care has become a major issue for the elderly, their family, and for policy makers. To help beneficiaries use LTC services appropriately to their needs of care, National Health Insurance Corporation(NHIC) provide them with the individualized care plan, named the Long-term Care User Guide. It includes recommendations for beneficiaries' most appropriate type of care. The purpose of this study is to develop a recommending system for care plan(Res-CP) in LTCI system. We used data set for Long-term Care User Guide in the 3rd long-term care insurance pilot programs. To develop the model, we tested four models, including a decision-tree model in data-mining, a logistic regression model, and a boosting and boosting techniques in an ensemble model. A decision-tree model was selected to describe the Res-CP, because it may be easy to explain the algorithm of Res-CP to the working groups. Res-CP might be useful in an evidence-based care planning in LTCI system and may contribute to support use of LTC services efficiently.

A Korean Community-based Question Answering System Using Multiple Machine Learning Methods (다중 기계학습 방법을 이용한 한국어 커뮤니티 기반 질의-응답 시스템)

  • Kwon, Sunjae;Kim, Juae;Kang, Sangwoo;Seo, Jungyun
    • Journal of KIISE
    • /
    • v.43 no.10
    • /
    • pp.1085-1093
    • /
    • 2016
  • Community-based Question Answering system is a system which provides answers for each question from the documents uploaded on web communities. In order to enhance the capacity of question analysis, former methods have developed specific rules suitable for a target region or have applied machine learning to partial processes. However, these methods incur an excessive cost for expanding fields or lead to cases in which system is overfitted for a specific field. This paper proposes a multiple machine learning method which automates the overall process by adapting appropriate machine learning in each procedure for efficient processing of community-based Question Answering system. This system can be divided into question analysis part and answer selection part. The question analysis part consists of the question focus extractor, which analyzes the focused phrases in questions and uses conditional random fields, and the question type classifier, which classifies topics of questions and uses support vector machine. In the answer selection part, the we trains weights that are used by the similarity estimation models through an artificial neural network. Also these are a number of cases in which the results of morphological analysis are not reliable for the data uploaded on web communities. Therefore, we suggest a method that minimizes the impact of morphological analysis by using character features in the stage of question analysis. The proposed system outperforms the former system by showing a Mean Average Precision criteria of 0.765 and R-Precision criteria of 0.872.

A Study on Fault Classification by EEMD Application of Gear Transmission Error (전달오차의 EEMD적용을 통한 기어 결함분류연구)

  • Park, Sungho;Choi, Joo-Ho
    • Journal of the Computational Structural Engineering Institute of Korea
    • /
    • v.30 no.2
    • /
    • pp.169-177
    • /
    • 2017
  • In this paper, classification of spall and crack faults of gear teeth is studied by applying the ensemble empirical mode decomposition(EEMD) for the gear transmission error(TE). Finite element models of the gears with the two faults are built, and TE is obtained by simulation of the gears under loaded contact. EEMD is applied to the residuals of the TE which are the difference between the normal and faulty signal. From the result, the difference of spall and crack faults are clearly identified by the intrinsic mode functions(IMF). A simple test bed is installed to illustrate the approach, which consists of motor, brake and a pair of spur gears. Two gears are employed to obtain the TE for the normal, spalled, and cracked gears, and the type of the faults are separated by the same EEMD application process. In order to quantify the results, crest factors are applied to each IMF. Characteristics of spall and crack are well represented by the crest factors of the first and the third IMF, which are used as the feature signals. The classification is carried out using the Bayes decision theory using the feature signals acquired through the experiments.

Development of Stochastic Downscaling Method for Rainfall Data Using GCM (GCM Ensemble을 활용한 추계학적 강우자료 상세화 기법 개발)

  • Kim, Tae-Jeong;Kwon, Hyun-Han;Lee, Dong-Ryul;Yoon, Sun-Kwon
    • Journal of Korea Water Resources Association
    • /
    • v.47 no.9
    • /
    • pp.825-838
    • /
    • 2014
  • The stationary Markov chain model has been widely used as a daily rainfall simulation model. A main assumption of the stationary Markov model is that statistical characteristics do not change over time and do not have any trends. In other words, the stationary Markov chain model for daily rainfall simulation essentially can not incorporate any changes in mean or variance into the model. Here we develop a Non-stationary hidden Markov chain model (NHMM) based stochastic downscaling scheme for simulating the daily rainfall sequences, using general circulation models (GCMs) as inputs. It has been acknowledged that GCMs perform well with respect to annual and seasonal variation at large spatial scale and they stand as one of the primary sources for obtaining forecasts. The proposed model is applied to daily rainfall series at three stations in Nakdong watershed. The model showed a better performance in reproducing most of the statistics associated with daily and seasonal rainfall. In particular, the proposed model provided a significant improvement in reproducing the extremes. It was confirmed that the proposed model could be used as a downscaling model for the purpose of generating plausible daily rainfall scenarios if elaborate GCM forecasts can used as a predictor. Also, the proposed NHMM model can be applied to climate change studies if GCM based climate change scenarios are used as inputs.

Bayesian networks-based probabilistic forecasting of hydrological drought considering drought propagation (가뭄의 전이 현상을 고려한 수문학적 가뭄에 대한 베이지안 네트워크 기반 확률 예측)

  • Shin, Ji Yae;Kwon, Hyun-Han;Lee, Joo-Heon;Kim, Tae-Woong
    • Journal of Korea Water Resources Association
    • /
    • v.50 no.11
    • /
    • pp.769-779
    • /
    • 2017
  • As the occurrence of drought is recently on the rise, the reliable drought forecasting is required for developing the drought mitigation and proactive management of water resources. This study developed a probabilistic hydrological drought forecasting method using the Bayesian Networks and drought propagation relationship to estimate future drought with the forecast uncertainty, named as the Propagated Bayesian Networks Drought Forecasting (PBNDF) model. The proposed PBNDF model was composed with 4 nodes of past, current, multi-model ensemble (MME) forecasted information and the drought propagation relationship. Using Palmer Hydrological Drought Index (PHDI), the PBNDF model was applied to forecast the hydrological drought condition at 10 gauging stations in Nakdong River basin. The receiver operating characteristics (ROC) curve analysis was applied to measure the forecast skill of the forecast mean values. The root mean squared error (RMSE) and skill score (SS) were employed to compare the forecast performance with previously developed forecast models (persistence forecast, Bayesian network drought forecast). We found that the forecast skill of PBNDF model showed better performance with low RMSE and high SS of 0.1~0.15. The overall results mean the PBNDF model had good potential in probabilistic drought forecasting.

Doubly-robust Q-estimation in observational studies with high-dimensional covariates (고차원 관측자료에서의 Q-학습 모형에 대한 이중강건성 연구)

  • Lee, Hyobeen;Kim, Yeji;Cho, Hyungjun;Choi, Sangbum
    • The Korean Journal of Applied Statistics
    • /
    • v.34 no.3
    • /
    • pp.309-327
    • /
    • 2021
  • Dynamic treatment regimes (DTRs) are decision-making rules designed to provide personalized treatment to individuals in multi-stage randomized trials. Unlike classical methods, in which all individuals are prescribed the same type of treatment, DTRs prescribe patient-tailored treatments which take into account individual characteristics that may change over time. The Q-learning method, one of regression-based algorithms to figure out optimal treatment rules, becomes more popular as it can be easily implemented. However, the performance of the Q-learning algorithm heavily relies on the correct specification of the Q-function for response, especially in observational studies. In this article, we examine a number of double-robust weighted least-squares estimating methods for Q-learning in high-dimensional settings, where treatment models for propensity score and penalization for sparse estimation are also investigated. We further consider flexible ensemble machine learning methods for the treatment model to achieve double-robustness, so that optimal decision rule can be correctly estimated as long as at least one of the outcome model or treatment model is correct. Extensive simulation studies show that the proposed methods work well with practical sample sizes. The practical utility of the proposed methods is proven with real data example.

A Vision Transformer Based Recommender System Using Side Information (부가 정보를 활용한 비전 트랜스포머 기반의 추천시스템)

  • Kwon, Yujin;Choi, Minseok;Cho, Yoonho
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.3
    • /
    • pp.119-137
    • /
    • 2022
  • Recent recommendation system studies apply various deep learning models to represent user and item interactions better. One of the noteworthy studies is ONCF(Outer product-based Neural Collaborative Filtering) which builds a two-dimensional interaction map via outer product and employs CNN (Convolutional Neural Networks) to learn high-order correlations from the map. However, ONCF has limitations in recommendation performance due to the problems with CNN and the absence of side information. ONCF using CNN has an inductive bias problem that causes poor performances for data with a distribution that does not appear in the training data. This paper proposes to employ a Vision Transformer (ViT) instead of the vanilla CNN used in ONCF. The reason is that ViT showed better results than state-of-the-art CNN in many image classification cases. In addition, we propose a new architecture to reflect side information that ONCF did not consider. Unlike previous studies that reflect side information in a neural network using simple input combination methods, this study uses an independent auxiliary classifier to reflect side information more effectively in the recommender system. ONCF used a single latent vector for user and item, but in this study, a channel is constructed using multiple vectors to enable the model to learn more diverse expressions and to obtain an ensemble effect. The experiments showed our deep learning model improved performance in recommendation compared to ONCF.