• Title/Summary/Keyword: Ensemble Model

Search Result 644, Processing Time 0.028 seconds

Dam Inflow Prediction and Evaluation Using Hybrid Auto-sklearn Ensemble Model (하이브리드 Auto-sklearn 앙상블 모델을 이용한 댐 유입량 예측 및 평가)

  • Lee, Seoro;Bae, Joo Hyun;Lee, Gwanjae;Yang, Dongseok;Hong, Jiyeong;Kim, Jonggun;Lim, Kyoung Jae
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2022.05a
    • /
    • pp.307-307
    • /
    • 2022
  • 최근 기후변화와 댐 상류 토지이용 변화 등과 같은 다양한 원인에 의해 댐 유입량의 변동성이 증가하면서 댐 관리 및 운영조작 의사 결정에 어려움이 발생하고 있다. 따라서 이러한 댐 유입량의 변동 특성을 반영하여 댐 유입량을 정확하고 효율적으로 예측할 수 있는 방안이 필요한 실정이다. 머신러닝 기술이 발전하면서 Auto-ML(Automated Machine Learning)이 다양한 분야에서 활용되고 있다. Auto-ML은 데이터 전처리, 최적 알고리즘 선택, 하이퍼파라미터 튜닝, 모델 학습 및 평가 등의 모든 과정을 자동화하는 기술이다. 그러나 아직까지 수문 분야에서 댐 유입량을 예측하기 위한 모델을 개발하는데 있어서 Auto-ML을 활용한 사례는 부족하고, 특히 댐 유입량의 예측 정확성을 확보하기 위해 High-inflow and low-inflow 의 변동 특성을 고려한 하이브리드 결합 방식을 통해 Auto-ML 기반 앙상블 모델을 개발하고 평가한 연구는 없다. 본 연구에서는 Auto-ML의 패키지 중 Auto-sklearn을 통해 홍수기, 비홍수기 유입량 변동 특성을 반영한 하이브리드 앙상블 댐 유입량 예측 모델을 개발하였다. 소양강댐을 대상으로 적용한 결과, 하이브리드 Auto-sklearn 앙상블 모델의 댐 유입량 예측 성능은 R2 0.868, RMSE 66.23 m3/s, MAE 16.45 m3/s로 단일 Auto-sklearn을 통해 구축 된 앙상블 모델보다 전반적으로 우수한 것으로 나타났다. 특히 FDC (Flow Duration Curve)의 저수기, 갈수기 구간에서 두 모델의 유입량 예측 경향은 큰 차이를 보였으며, 하이브리드 Auto-sklearn 모델의 예측 값이 관측 값과 더욱 유사한 것으로 나타났다. 이는 홍수기, 비홍수기 구간에 대한 앙상블 모델이 독립적으로 구축되는 과정에서 각 모델에 대한 하이퍼파라미터가 최적화되었기 때문이라 판단된다. 향후 본 연구의 방법론은 보다 정확한 댐 유입량 예측 자료를 생성하기 위한 방안 수립뿐만 아니라 다양한 분야의 불균형한 데이터셋을 이용한 앙상블 모델을 구축하는데도 유용하게 활용될 수 있을 것으로 사료된다.

  • PDF

FubaoLM : Automatic Evaluation based on Chain-of-Thought Distillation with Ensemble Learning (FubaoLM : 연쇄적 사고 증류와 앙상블 학습에 의한 대규모 언어 모델 자동 평가)

  • Huiju Kim;Donghyeon Jeon;Ohjoon Kwon;Soonhwan Kwon;Hansu Kim;Inkwon Lee;Dohyeon Kim;Inho Kang
    • Annual Conference on Human and Language Technology
    • /
    • 2023.10a
    • /
    • pp.448-453
    • /
    • 2023
  • 대규모 언어 모델 (Large Language Model, LLM)을 인간의 선호도 관점에서 평가하는 것은 기존의 벤치마크 평가와는 다른 도전적인 과제이다. 이를 위해, 기존 연구들은 강력한 LLM을 평가자로 사용하여 접근하였지만, 높은 비용 문제가 부각되었다. 또한, 평가자로서 LLM이 사용하는 주관적인 점수 기준은 모호하여 평가 결과의 신뢰성을 저해하며, 단일 모델에 의한 평가 결과는 편향될 가능성이 있다. 본 논문에서는 엄격한 기준을 활용하여 편향되지 않은 평가를 수행할 수 있는 평가 프레임워크 및 평가자 모델 'FubaoLM'을 제안한다. 우리의 평가 프레임워크는 심층적인 평가 기준을 통해 다수의 강력한 한국어 LLM을 활용하여 연쇄적 사고(Chain-of-Thought) 기반 평가를 수행한다. 이러한 평가 결과를 다수결로 통합하여 편향되지 않은 평가 결과를 도출하며, 지시 조정 (instruction tuning)을 통해 FubaoLM은 다수의 LLM으로 부터 평가 지식을 증류받는다. 더 나아가 본 논문에서는 전문가 기반 평가 데이터셋을 구축하여 FubaoLM 효과성을 입증한다. 우리의 실험에서 앙상블된 FubaoLM은 GPT-3.5 대비 16% 에서 23% 향상된 절대 평가 성능을 가지며, 이항 평가에서 인간과 유사한 선호도 평가 결과를 도출한다. 이를 통해 FubaoLM은 비교적 적은 비용으로도 높은 신뢰성을 유지하며, 편향되지 않은 평가를 수행할 수 있음을 보인다.

  • PDF

Students' Performance Prediction in Higher Education Using Multi-Agent Framework Based Distributed Data Mining Approach: A Review

  • M.Nazir;A.Noraziah;M.Rahmah
    • International Journal of Computer Science & Network Security
    • /
    • v.23 no.10
    • /
    • pp.135-146
    • /
    • 2023
  • An effective educational program warrants the inclusion of an innovative construction which enhances the higher education efficacy in such a way that accelerates the achievement of desired results and reduces the risk of failures. Educational Decision Support System (EDSS) has currently been a hot topic in educational systems, facilitating the pupil result monitoring and evaluation to be performed during their development. Insufficient information systems encounter trouble and hurdles in making the sufficient advantage from EDSS owing to the deficit of accuracy, incorrect analysis study of the characteristic, and inadequate database. DMTs (Data Mining Techniques) provide helpful tools in finding the models or forms of data and are extremely useful in the decision-making process. Several researchers have participated in the research involving distributed data mining with multi-agent technology. The rapid growth of network technology and IT use has led to the widespread use of distributed databases. This article explains the available data mining technology and the distributed data mining system framework. Distributed Data Mining approach is utilized for this work so that a classifier capable of predicting the success of students in the economic domain can be constructed. This research also discusses the Intelligent Knowledge Base Distributed Data Mining framework to assess the performance of the students through a mid-term exam and final-term exam employing Multi-agent system-based educational mining techniques. Using single and ensemble-based classifiers, this study intends to investigate the factors that influence student performance in higher education and construct a classification model that can predict academic achievement. We also discussed the importance of multi-agent systems and comparative machine learning approaches in EDSS development.

Development of Type 2 Prediction Prediction Based on Big Data (빅데이터 기반 2형 당뇨 예측 알고리즘 개발)

  • Hyun Sim;HyunWook Kim
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.18 no.5
    • /
    • pp.999-1008
    • /
    • 2023
  • Early prediction of chronic diseases such as diabetes is an important issue, and improving the accuracy of diabetes prediction is especially important. Various machine learning and deep learning-based methodologies are being introduced for diabetes prediction, but these technologies require large amounts of data for better performance than other methodologies, and the learning cost is high due to complex data models. In this study, we aim to verify the claim that DNN using the pima dataset and k-fold cross-validation reduces the efficiency of diabetes diagnosis models. Machine learning classification methods such as decision trees, SVM, random forests, logistic regression, KNN, and various ensemble techniques were used to determine which algorithm produces the best prediction results. After training and testing all classification models, the proposed system provided the best results on XGBoost classifier with ADASYN method, with accuracy of 81%, F1 coefficient of 0.81, and AUC of 0.84. Additionally, a domain adaptation method was implemented to demonstrate the versatility of the proposed system. An explainable AI approach using the LIME and SHAP frameworks was implemented to understand how the model predicts the final outcome.

Crop Yield Estimation Utilizing Feature Selection Based on Graph Classification (그래프 분류 기반 특징 선택을 활용한 작물 수확량 예측)

  • Ohnmar Khin;Sung-Keun Lee
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.18 no.6
    • /
    • pp.1269-1276
    • /
    • 2023
  • Crop estimation is essential for the multinational meal and powerful demand due to its numerous aspects like soil, rain, climate, atmosphere, and their relations. The consequence of climate shift impacts the farming yield products. We operate the dataset with temperature, rainfall, humidity, etc. The current research focuses on feature selection with multifarious classifiers to assist farmers and agriculturalists. The crop yield estimation utilizing the feature selection approach is 96% accuracy. Feature selection affects a machine learning model's performance. Additionally, the performance of the current graph classifier accepts 81.5%. Eventually, the random forest regressor without feature selections owns 78% accuracy and the decision tree regressor without feature selections retains 67% accuracy. Our research merit is to reveal the experimental results of with and without feature selection significance for the proposed ten algorithms. These findings support learners and students in choosing the appropriate models for crop classification studies.

A Bi-directional Information Learning Method Using Reverse Playback Video for Fully Supervised Temporal Action Localization (완전지도 시간적 행동 검출에서 역재생 비디오를 이용한 양방향 정보 학습 방법)

  • Huiwon Gwon;Hyejeong Jo;Sunhee Jo;Chanho Jung
    • Journal of IKEEE
    • /
    • v.28 no.2
    • /
    • pp.145-149
    • /
    • 2024
  • Recently, research on temporal action localization has been actively conducted. In this paper, unlike existing methods, we propose two approaches for learning bidirectional information by creating reverse playback videos for fully supervised temporal action localization. One approach involves creating training data by combining reverse playback videos and forward playback videos, while the other approach involves training separate models on videos with different playback directions. Experiments were conducted on the THUMOS-14 dataset using TALLFormer. When using both reverse and forward playback videos as training data, the performance was 5.1% lower than that of the existing method. On the other hand, using a model ensemble shows a 1.9% improvement in performance.

Learning Wind Speed Forecast Model based on Numeric Prediction Algorithm (수치 예측 알고리즘 기반의 풍속 예보 모델 학습)

  • Kim, Se-Young;Kim, Jeong-Min;Ryu, Kwang-Ryel
    • Journal of the Korea Society of Computer and Information
    • /
    • v.20 no.3
    • /
    • pp.19-27
    • /
    • 2015
  • Technologies of wind power generation for development of alternative energy technology have been accumulated over the past 20 years. Wind power generation is environmentally friendly and economical because it uses the wind blowing in nature as energy resource. In order to operate wind power generation efficiently, it is necessary to accurately predict wind speed changing every moment in nature. It is important not only averagely how well to predict wind speed but also to minimize the largest absolute error between real value and prediction value of wind speed. In terms of generation operating plan, minimizing the largest absolute error plays an important role for building flexible generation operating plan because the difference between predicting power and real power causes economic loss. In this paper, we propose a method of wind speed prediction using numeric prediction algorithm-based wind speed forecast model made to analyze the wind speed forecast given by the Meteorological Administration and pattern value for considering seasonal property of wind speed as well as changing trend of past wind speed. The wind speed forecast given by the Meteorological Administration is the forecast in respect to comparatively wide area including wind generation farm. But it contributes considerably to make accuracy of wind speed prediction high. Also, the experimental results demonstrate that as the rate of wind is analyzed in more detail, the greater accuracy will be obtained.

Doubly-robust Q-estimation in observational studies with high-dimensional covariates (고차원 관측자료에서의 Q-학습 모형에 대한 이중강건성 연구)

  • Lee, Hyobeen;Kim, Yeji;Cho, Hyungjun;Choi, Sangbum
    • The Korean Journal of Applied Statistics
    • /
    • v.34 no.3
    • /
    • pp.309-327
    • /
    • 2021
  • Dynamic treatment regimes (DTRs) are decision-making rules designed to provide personalized treatment to individuals in multi-stage randomized trials. Unlike classical methods, in which all individuals are prescribed the same type of treatment, DTRs prescribe patient-tailored treatments which take into account individual characteristics that may change over time. The Q-learning method, one of regression-based algorithms to figure out optimal treatment rules, becomes more popular as it can be easily implemented. However, the performance of the Q-learning algorithm heavily relies on the correct specification of the Q-function for response, especially in observational studies. In this article, we examine a number of double-robust weighted least-squares estimating methods for Q-learning in high-dimensional settings, where treatment models for propensity score and penalization for sparse estimation are also investigated. We further consider flexible ensemble machine learning methods for the treatment model to achieve double-robustness, so that optimal decision rule can be correctly estimated as long as at least one of the outcome model or treatment model is correct. Extensive simulation studies show that the proposed methods work well with practical sample sizes. The practical utility of the proposed methods is proven with real data example.

A Study on the Prediction of Rock Classification Using Shield TBM Data and Machine Learning Classification Algorithms (쉴드 TBM 데이터와 머신러닝 분류 알고리즘을 이용한 암반 분류 예측에 관한 연구)

  • Kang, Tae-Ho;Choi, Soon-Wook;Lee, Chulho;Chang, Soo-Ho
    • Tunnel and Underground Space
    • /
    • v.31 no.6
    • /
    • pp.494-507
    • /
    • 2021
  • With the increasing use of TBM, research has recently been conducted in Korea to analyze TBM data with machine learning techniques to predict the ground in front of TBM, predict the exchange cycle of disk cutters, and predict the advance rate of TBM. In this study, classification prediction of rock characteristics of slurry shield TBM sites was made by combining traditional rock classification techniques and machine learning techniques widely used in various fields with machine data during TBM excavation. The items of rock characteristic classification criteria were set as RQD, uniaxial compression strength, and elastic wave speed, and the rock conditions for each item were classified into three classes: class 0 (good), 1 (normal), and 2 (poor), and machine learning was performed on six class algorithms. As a result, the ensemble model showed good performance, and the LigthtGBM model, which showed excellent results in learning speed as well as learning performance, was found to be optimal in the target site ground. Using the classification model for the three rock characteristics set in this study, it is believed that it will be possible to provide rock conditions for sections where ground information is not provided, which will help during excavation work.

A Method of Machine Learning-based Defective Health Functional Food Detection System for Efficient Inspection of Imported Food (효율적 수입식품 검사를 위한 머신러닝 기반 부적합 건강기능식품 탐지 방법)

  • Lee, Kyoungsu;Bak, Yerin;Shin, Yoonjong;Sohn, Kwonsang;Kwon, Ohbyung
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.3
    • /
    • pp.139-159
    • /
    • 2022
  • As interest in health functional foods has increased since COVID-19, the importance of imported food safety inspections is growing. However, in contrast to the annual increase in imports of health functional foods, the budget and manpower required for inspections for import and export are reaching their limit. Hence, the purpose of this study is to propose a machine learning model that efficiently detects unsuitable food suitable for the characteristics of data possessed by government offices on imported food. First, the components of food import/export inspections data that affect the judgment of nonconformity were examined and derived variables were newly created. Second, in order to select features for the machine learning, class imbalance and nonlinearity were considered when performing exploratory analysis on imported food-related data. Third, we try to compare the performance and interpretability of each model by applying various machine learning techniques. In particular, the ensemble model was the best, and it was confirmed that the derived variables and models proposed in this study can be helpful to the system used in import/export inspections.