• Title/Summary/Keyword: Ensemble models

Search Result 360, Processing Time 0.031 seconds

Korean Dependency Parsing Using Various Ensemble Models (다양한 앙상블 알고리즘을 이용한 한국어 의존 구문 분석)

  • Jo, Gyeong-Cheol;Kim, Ju-Wan;Kim, Gyun-Yeop;Park, Seong-Jin;Gang, Sang-U
    • Annual Conference on Human and Language Technology
    • /
    • 2019.10a
    • /
    • pp.543-545
    • /
    • 2019
  • 본 논문은 최신 한국어 의존 구문 분석 모델(Korean dependency parsing model)들과 다양한 앙상블 모델(ensemble model)들을 결합하여 그 성능을 분석한다. 단어 표현은 미리 학습된 워드 임베딩 모델(word embedding model)과 ELMo(Embedding from Language Model), Bert(Bidirectional Encoder Representations from Transformer) 그리고 다양한 추가 자질들을 사용한다. 또한 사용된 의존 구문 분석 모델로는 Stack Pointer Network Model, Deep Biaffine Attention Parser와 Left to Right Pointer Parser를 이용한다. 최종적으로 각 모델의 분석 결과를 앙상블 모델인 Bagging 기법과 XGBoost(Extreme Gradient Boosting) 이용하여 최적의 모델을 제안한다.

  • PDF

A Study on Short-Term Electricity Demand Prediction Using Stacking Ensemble of Machine Learning and Deep Learning Ensemble Models (머신러닝 및 딥러닝 모델의 스태킹 앙상블을 이용한 단기 전력수요 예측에 관한 연구)

  • Lee, Jung-Il;Kim, Dong-il
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2021.11a
    • /
    • pp.566-569
    • /
    • 2021
  • 전력수요는 월, 요일 및 시간의 계절성(Seasonality)을 보이는 데이터이다. 각 계절성에 따라 특성이 다르기 때문에, 전력수요를 예측하기 위해서는 계절성의 특성을 고려한 다양한 모델을 선정하고, 병합하는 방법이 필요하다. 본 연구에서는 전력수요의 계절성을 고려한 다양한 예측모델을 병합하여 이용할 수 있도록 스태킹 앙상블 적용하고 실험결과를 기술한다. 또한, 162개 도시의 기상 데이터와 인구 데이터를 예측에 이용하는 방법, Regression 모델과 Time-series모델에 입력하는 특징(Feature)의 전처리 방법, 베이지안 최적화를 이용한 머신러닝 및 딥러닝 모델의 하이퍼파라메터 최적화 방법을 제시한다.

Using Ensemble Learning Algorithm and AI Facial Expression Recognition, Healing Service Tailored to User's Emotion (앙상블 학습 알고리즘과 인공지능 표정 인식 기술을 활용한 사용자 감정 맞춤 힐링 서비스)

  • Yang, seong-yeon;Hong, Dahye;Moon, Jaehyun
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2022.11a
    • /
    • pp.818-820
    • /
    • 2022
  • The keyword 'healing' is essential to the competitive society and culture of Koreans. In addition, as the time at home increases due to COVID-19, the demand for indoor healing services has increased. Therefore, this thesis analyzes the user's facial expression so that people can receive various 'customized' healing services indoors, and based on this, provides lighting, ASMR, video recommendation service, and facial expression recording service.The user's expression was analyzed by applying the ensemble algorithm to the expression prediction results of various CNN models after extracting only the face through object detection from the image taken by the user.

Harvest Forecasting Improvement Using Federated Learning and Ensemble Model

  • Ohnmar Khin;Jin Gwang Koh;Sung Keun Lee
    • Smart Media Journal
    • /
    • v.12 no.10
    • /
    • pp.9-18
    • /
    • 2023
  • Harvest forecasting is the great demand of multiple aspects like temperature, rain, environment, and their relations. The existing study investigates the climate conditions and aids the cultivators to know the harvest yields before planting in farms. The proposed study uses federated learning. In addition, the additional widespread techniques such as bagging classifier, extra tees classifier, linear discriminant analysis classifier, quadratic discriminant analysis classifier, stochastic gradient boosting classifier, blending models, random forest regressor, and AdaBoost are utilized together. These presented nine algorithms achieved exemplary satisfactory accuracies. The powerful contributions of proposed algorithms can create exact harvest forecasting. Ultimately, we intend to compare our study with the earlier research's results.

Prediction of Transition Temperature and Magnetocaloric Effects in Bulk Metallic Glasses with Ensemble Models (앙상블 기계학습 모델을 이용한 비정질 소재의 자기냉각 효과 및 전이온도 예측)

  • Chunghee Nam
    • Korean Journal of Materials Research
    • /
    • v.34 no.7
    • /
    • pp.363-369
    • /
    • 2024
  • In this study, the magnetocaloric effect and transition temperature of bulk metallic glass, an amorphous material, were predicted through machine learning based on the composition features. From the Python module 'Matminer', 174 compositional features were obtained, and prediction performance was compared while reducing the composition features to prevent overfitting. After optimization using RandomForest, an ensemble model, changes in prediction performance were analyzed according to the number of compositional features. The R2 score was used as a performance metric in the regression prediction, and the best prediction performance was found using only 90 features predicting transition temperature, and 20 features predicting magnetocaloric effects. The most important feature when predicting magnetocaloric effects was the 'Fe' compositional ratio. The feature importance method provided by 'scikit-learn' was applied to sort compositional features. The feature importance method was found to be appropriate by comparing the prediction performance of the Fe-contained dataset with the full dataset.

Development of Classification Model for hERG Ion Channel Inhibitors Using SVM Method (SVM 방법을 이용한 hERG 이온 채널 저해제 예측모델 개발)

  • Gang, Sin-Moon;Kim, Han-Jo;Oh, Won-Seok;Kim, Sun-Young;No, Kyoung-Tai;Nam, Ky-Youb
    • Journal of the Korean Chemical Society
    • /
    • v.53 no.6
    • /
    • pp.653-662
    • /
    • 2009
  • Developing effective tools for predicting absorption, distribution, metabolism, excretion properties and toxicity (ADME/T) of new chemical entities in the early stage of drug design is one of the most important tasks in drug discovery and development today. As one of these attempts, support vector machines (SVM) has recently been exploited for the prediction of ADME/T related properties. However, two problems in SVM modeling, i.e. feature selection and parameters setting, are still far from solved. The two problems have been shown to be crucial to the efficiency and accuracy of SVM classification. In particular, the feature selection and optimal SVM parameters setting influence each other, which indicates that they should be dealt with simultaneously. In this account, we present an integrated practical solution, in which genetic-based algorithm (GA) is used for feature selection and grid search (GS) method for parameters optimization. hERG ion-channel inhibitor classification models of ADME/T related properties has been built for assessing and testing the proposed GA-GS-SVM. We generated 6 different models that are 3 different single models and 3 different ensemble models using training set - 1891 compounds and validated with external test set - 175 compounds. We compared single model with ensemble model to solve data imbalance problems. It was able to improve accuracy of prediction to use ensemble model.

Future Korean Water Resources Projection Considering Uncertainty of GCMs and Hydrological Models (GCM과 수문모형의 불확실성을 고려한 기후변화에 따른 한반도 미래 수자원 전망)

  • Bae, Deg-Hyo;Jung, Il-Won;Lee, Byung-Ju;Lee, Moon-Hwan
    • Journal of Korea Water Resources Association
    • /
    • v.44 no.5
    • /
    • pp.389-406
    • /
    • 2011
  • The objective of this study is to examine the climate change impact assessment on Korean water resources considering the uncertainties of Global Climate Models (GCMs) and hydrological models. The 3 different emission scenarios (A2, A1B, B1) and 13 GCMs' results are used to consider the uncertainties of the emission scenario and GCM, while PRMS, SWAT, and SLURP models are employed to consider the effects of hydrological model structures and potential evapotranspiration (PET) computation methods. The 312 ensemble results are provided to 109 mid-size sub-basins over South Korean and Gaussian kernel density functions obtained from their ensemble results are suggested with the ensemble mean and their variabilities of the results. It shows that the summer and winter runoffs are expected to be increased and spring runoff to be decreased for the future 3 periods relative to past 30-year reference period. It also provides that annual average runoff increased over all sub-basins, but the increases in the northern basins including Han River basin are greater than those in the southern basins. Due to the reason that the increase in annual average runoff is mainly caused by the increase in summer runoff and consequently the seasonal runoff variations according to climate change would be severe, the climate change impact on Korean water resources could intensify the difficulties to water resources conservation and management. On the other hand, as regards to the uncertainties, the highest and lowest ones are in winter and summer seasons, respectively.

A Comparative Analysis of Ensemble Learning-Based Classification Models for Explainable Term Deposit Subscription Forecasting (설명 가능한 정기예금 가입 여부 예측을 위한 앙상블 학습 기반 분류 모델들의 비교 분석)

  • Shin, Zian;Moon, Jihoon;Rho, Seungmin
    • The Journal of Society for e-Business Studies
    • /
    • v.26 no.3
    • /
    • pp.97-117
    • /
    • 2021
  • Predicting term deposit subscriptions is one of representative financial marketing in banks, and banks can build a prediction model using various customer information. In order to improve the classification accuracy for term deposit subscriptions, many studies have been conducted based on machine learning techniques. However, even if these models can achieve satisfactory performance, utilizing them is not an easy task in the industry when their decision-making process is not adequately explained. To address this issue, this paper proposes an explainable scheme for term deposit subscription forecasting. For this, we first construct several classification models using decision tree-based ensemble learning methods, which yield excellent performance in tabular data, such as random forest, gradient boosting machine (GBM), extreme gradient boosting (XGB), and light gradient boosting machine (LightGBM). We then analyze their classification performance in depth through 10-fold cross-validation. After that, we provide the rationale for interpreting the influence of customer information and the decision-making process by applying Shapley additive explanation (SHAP), an explainable artificial intelligence technique, to the best classification model. To verify the practicality and validity of our scheme, experiments were conducted with the bank marketing dataset provided by Kaggle; we applied the SHAP to the GBM and LightGBM models, respectively, according to different dataset configurations and then performed their analysis and visualization for explainable term deposit subscriptions.

Tor Network Website Fingerprinting Using Statistical-Based Feature and Ensemble Learning of Traffic Data (트래픽 데이터의 통계적 기반 특징과 앙상블 학습을 이용한 토르 네트워크 웹사이트 핑거프린팅)

  • Kim, Junho;Kim, Wongyum;Hwang, Doosung
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.9 no.6
    • /
    • pp.187-194
    • /
    • 2020
  • This paper proposes a website fingerprinting method using ensemble learning over a Tor network that guarantees client anonymity and personal information. We construct a training problem for website fingerprinting from the traffic packets collected in the Tor network, and compare the performance of the website fingerprinting system using tree-based ensemble models. A training feature vector is prepared from the general information, burst, cell sequence length, and cell order that are extracted from the traffic sequence, and the features of each website are represented with a fixed length. For experimental evaluation, we define four learning problems (Wang14, BW, CWT, CWH) according to the use of website fingerprinting, and compare the performance with the support vector machine model using CUMUL feature vectors. In the experimental evaluation, the proposed statistical-based training feature representation is superior to the CUMUL feature representation except for the BW case.

Stochastic Simple Hydrologic Partitioning Model Associated with Markov Chain Monte Carlo and Ensemble Kalman Filter (마코프 체인 몬테카를로 및 앙상블 칼만필터와 연계된 추계학적 단순 수문분할모형)

  • Choi, Jeonghyeon;Lee, Okjeong;Won, Jeongeun;Kim, Sangdan
    • Journal of Korean Society on Water Environment
    • /
    • v.36 no.5
    • /
    • pp.353-363
    • /
    • 2020
  • Hydrologic models can be classified into two types: those for understanding physical processes and those for predicting hydrologic quantities. This study deals with how to use the model to predict today's stream flow based on the system's knowledge of yesterday's state and the model parameters. In this regard, for the model to generate accurate predictions, the uncertainty of the parameters and appropriate estimates of the state variables are required. In this study, a relatively simple hydrologic partitioning model is proposed that can explicitly implement the hydrologic partitioning process, and the posterior distribution of the parameters of the proposed model is estimated using the Markov chain Monte Carlo approach. Further, the application method of the ensemble Kalman filter is proposed for updating the normalized soil moisture, which is the state variable of the model, by linking the information on the posterior distribution of the parameters and by assimilating the observed steam flow data. The stochastically and recursively estimated stream flows using the data assimilation technique revealed better representation of the observed data than the stream flows predicted using the deterministic model. Therefore, the ensemble Kalman filter in conjunction with the Markov chain Monte Carlo approach could be a reliable and effective method for forecasting daily stream flow, and it could also be a suitable method for routinely updating and monitoring the watershed-averaged soil moisture.