• 제목/요약/키워드: Ensemble model

검색결과 662건 처리시간 0.032초

앙상블을 이용한 기계학습 기법의 설계: 뜰개 이동경로 예측을 통한 실험적 검증 (Ensemble Design of Machine Learning Technigues: Experimental Verification by Prediction of Drifter Trajectory)

  • 이찬재;김용혁
    • 예술인문사회 융합 멀티미디어 논문지
    • /
    • 제8권3호
    • /
    • pp.57-67
    • /
    • 2018
  • 앙상블 기법은 기계학습에서 다수의 알고리즘을 사용하여 더 좋은 성능을 내기 위해 사용하는 방법이다. 본 논문에서는 앙상블 기법에서 많이 사용되는 부스팅과 배깅에 대해 소개를 하고, 서포트벡터 회귀, 방사기저함수 네트워크, 가우시안 프로세스, 다층 퍼셉트론을 이용하여 설계한다. 추가적으로 순환신경망과 MOHID 수치모델을 추가하여 실험을 진행한다. 실험적 검증를 위해 사용하는 뜰개 데이터는 7 개의 지역에서 관측된 683 개의 관측 자료다. 뜰개 관측 자료를 이용하여 6 개의 알고리즘과의 비교를 통해 앙상블 기법의 성능을 검증한다. 검증 방법으로는 평균절대오차를 사용한다. 실험 방법은 배깅, 부스팅, 기계학습을 이용한 앙상블 모델을 이용하여 진행한다. 각 앙상블 모델마다 동일한 가중치를 부여한 방법, 차등한 가중치를 부여한 방법을 이용하여 오류율을 계산한다. 가장 좋은 오류율을 나타낸 방법은 기계학습을 이용한 앙상블 모델로서 6 개의 기계학습의 평균에 비해 61.7%가 개선된 결과를 보였다.

여름강수량의 단기예측을 위한 Multi-Ensemble GCMs 기반 시공간적 Downscaling 기법 개발 (Development of Multi-Ensemble GCMs Based Spatio-Temporal Downscaling Scheme for Short-term Prediction)

  • 권현한;민영미
    • 한국수자원학회:학술대회논문집
    • /
    • 한국수자원학회 2009년도 학술발표회 초록집
    • /
    • pp.1142-1146
    • /
    • 2009
  • A rainfall simulation and forecasting technique that can generate daily rainfall sequences conditional on multi-model ensemble GCMs is developed and applied to data in Korea for the major rainy season. The GCM forecasts are provided by APEC climate center. A Weather State Based Downscaling Model (WSDM) is used to map teleconnections from ocean-atmosphere data or key state variables from numerical integrations of Ocean-Atmosphere General Circulation Models to simulate daily sequences at multiple rain gauges. The method presented is general and is applied to the wet season which is JJA(June-July-August) data in Korea. The sequences of weather states identified by the EM algorithm are shown to correspond to dominant synoptic-scale features of rainfall generating mechanisms. Application of the methodology to seasonal rainfall forecasts using empirical teleconnections and GCM derived climate forecast are discussed.

  • PDF

Design and Implementation of the Ensemble-based Classification Model by Using k-means Clustering

  • Song, Sung-Yeol;Khil, A-Ra
    • 한국컴퓨터정보학회논문지
    • /
    • 제20권10호
    • /
    • pp.31-38
    • /
    • 2015
  • In this paper, we propose the ensemble-based classification model which extracts just new data patterns from the streaming-data by using clustering and generates new classification models to be added to the ensemble in order to reduce the number of data labeling while it keeps the accuracy of the existing system. The proposed technique performs clustering of similar patterned data from streaming data. It performs the data labeling to each cluster at the point when a certain amount of data has been gathered. The proposed technique applies the K-NN technique to the classification model unit in order to keep the accuracy of the existing system while it uses a small amount of data. The proposed technique is efficient as using about 3% less data comparing with the existing technique as shown the simulation results for benchmarks, thereby using clustering.

인공지능을 활용한 기계학습 앙상블 모델 개발 (Development of Machine Learning Ensemble Model using Artificial Intelligence)

  • 이근원;원윤정;송영범;조기섭
    • 열처리공학회지
    • /
    • 제34권5호
    • /
    • pp.211-217
    • /
    • 2021
  • To predict mechanical properties of secondary hardening martensitic steels, a machine learning ensemble model was established. Based on ANN(Artificial Neural Network) architecture, some kinds of methods was considered to optimize the model. In particular, interaction features, which can reflect interactions between chemical compositions and processing conditions of real alloy system, was considered by means of feature engineering, and then K-Fold cross validation coupled with bagging ensemble were investigated to reduce R2_score and a factor indicating average learning errors owing to biased experimental database.

Improved ensemble machine learning framework for seismic fragility analysis of concrete shear wall system

  • Sangwoo Lee;Shinyoung Kwag;Bu-seog Ju
    • Computers and Concrete
    • /
    • 제32권3호
    • /
    • pp.313-326
    • /
    • 2023
  • The seismic safety of the shear wall structure can be assessed through seismic fragility analysis, which requires high computational costs in estimating seismic demands. Accordingly, machine learning methods have been applied to such fragility analyses in recent years to reduce the numerical analysis cost, but it still remains a challenging task. Therefore, this study uses the ensemble machine learning method to present an improved framework for developing a more accurate seismic demand model than the existing ones. To this end, a rank-based selection method that enables determining an excellent model among several single machine learning models is presented. In addition, an index that can evaluate the degree of overfitting/underfitting of each model for the selection of an excellent single model is suggested. Furthermore, based on the selected single machine learning model, we propose a method to derive a more accurate ensemble model based on the bagging method. As a result, the seismic demand model for which the proposed framework is applied shows about 3-17% better prediction performance than the existing single machine learning models. Finally, the seismic fragility obtained from the proposed framework shows better accuracy than the existing fragility methods.

SHM data anomaly classification using machine learning strategies: A comparative study

  • Chou, Jau-Yu;Fu, Yuguang;Huang, Shieh-Kung;Chang, Chia-Ming
    • Smart Structures and Systems
    • /
    • 제29권1호
    • /
    • pp.77-91
    • /
    • 2022
  • Various monitoring systems have been implemented in civil infrastructure to ensure structural safety and integrity. In long-term monitoring, these systems generate a large amount of data, where anomalies are not unusual and can pose unique challenges for structural health monitoring applications, such as system identification and damage detection. Therefore, developing efficient techniques is quite essential to recognize the anomalies in monitoring data. In this study, several machine learning techniques are explored and implemented to detect and classify various types of data anomalies. A field dataset, which consists of one month long acceleration data obtained from a long-span cable-stayed bridge in China, is employed to examine the machine learning techniques for automated data anomaly detection. These techniques include the statistic-based pattern recognition network, spectrogram-based convolutional neural network, image-based time history convolutional neural network, image-based time-frequency hybrid convolution neural network (GoogLeNet), and proposed ensemble neural network model. The ensemble model deliberately combines different machine learning models to enhance anomaly classification performance. The results show that all these techniques can successfully detect and classify six types of data anomalies (i.e., missing, minor, outlier, square, trend, drift). Moreover, both image-based time history convolutional neural network and GoogLeNet are further investigated for the capability of autonomous online anomaly classification and found to effectively classify anomalies with decent performance. As seen in comparison with accuracy, the proposed ensemble neural network model outperforms the other three machine learning techniques. This study also evaluates the proposed ensemble neural network model to a blind test dataset. As found in the results, this ensemble model is effective for data anomaly detection and applicable for the signal characteristics changing over time.

앙상블 러닝 기반 동적 가중치 할당 모델을 통한 보험금 예측 인공지능 연구 (Research on Insurance Claim Prediction Using Ensemble Learning-Based Dynamic Weighted Allocation Model)

  • 최종석
    • 한국정보전자통신기술학회논문지
    • /
    • 제17권4호
    • /
    • pp.221-228
    • /
    • 2024
  • 보험금 예측은 보험사의 리스크 관리와 재무 건전성 유지를 위한 핵심 과제 중 하나이다. 정확한 보험금 예측을 통해 보험사는 적정한 보험료를 책정하고, 예상 외의 손실을 줄이며, 고객 서비스의 질을 향상시킬 수 있다. 본 연구에서는 앙상블 러닝 기법을 적용하여 보험금 예측 모델의 성능을 향상시키고자 한다. 랜덤 포레스트(Random Forest), 그래디언트 부스팅 머신(Gradient Boosting Machine, GBM), XGBoost, Stacking, 그리고 제안한 동적 가중치 할당 모델(Dynamic Weighted Ensemble, DWE) 모델을 사용하여 예측 성능을 비교 분석하였다. 모델의 성능 평가는 평균 절대 오차(MAE), 평균 제곱근 오차(MSE), 결정 계수(R2) 등을 사용하여 수행되었다. 실험 결과, 동적 가중치 할당 모델이 평가 지표에서 가장 우수한 성능을 보였으며, 이는 랜덤 포레스트와 XGBoost, LR, LightGBM의 예측 결과를 결합하여 최적의 예측 성능을 도출한 결과이다. 본 연구는 앙상블 러닝 기법이 보험금 예측의 정확성을 높이는 데 효과적임을 입증하며, 보험업계에서 인공지능 기반 예측 모델의 활용 가능성을 제시한다.

Rockfall Source Identification Using a Hybrid Gaussian Mixture-Ensemble Machine Learning Model and LiDAR Data

  • Fanos, Ali Mutar;Pradhan, Biswajeet;Mansor, Shattri;Yusoff, Zainuddin Md;Abdullah, Ahmad Fikri bin;Jung, Hyung-Sup
    • 대한원격탐사학회지
    • /
    • 제35권1호
    • /
    • pp.93-115
    • /
    • 2019
  • The availability of high-resolution laser scanning data and advanced machine learning algorithms has enabled an accurate potential rockfall source identification. However, the presence of other mass movements, such as landslides within the same region of interest, poses additional challenges to this task. Thus, this research presents a method based on an integration of Gaussian mixture model (GMM) and ensemble artificial neural network (bagging ANN [BANN]) for automatic detection of potential rockfall sources at Kinta Valley area, Malaysia. The GMM was utilised to determine slope angle thresholds of various geomorphological units. Different algorithms(ANN, support vector machine [SVM] and k nearest neighbour [kNN]) were individually tested with various ensemble models (bagging, voting and boosting). Grid search method was adopted to optimise the hyperparameters of the investigated base models. The proposed model achieves excellent results with success and prediction accuracies at 95% and 94%, respectively. In addition, this technique has achieved excellent accuracies (ROC = 95%) over other methods used. Moreover, the proposed model has achieved the optimal prediction accuracies (92%) on the basis of testing data, thereby indicating that the model can be generalised and replicated in different regions, and the proposed method can be applied to various landslide studies.

An Ensemble Model for Credit Default Discrimination: Incorporating BERT-based NLP and Transformer

  • Sophot Ky;Ju-Hong Lee
    • 한국정보처리학회:학술대회논문집
    • /
    • 한국정보처리학회 2023년도 춘계학술발표대회
    • /
    • pp.624-626
    • /
    • 2023
  • Credit scoring is a technique used by financial institutions to assess the creditworthiness of potential borrowers. This involves evaluating a borrower's credit history to predict the likelihood of defaulting on a loan. This paper presents an ensemble of two Transformer based models within a framework for discriminating the default risk of loan applications in the field of credit scoring. The first model is FinBERT, a pretrained NLP model to analyze sentiment of financial text. The second model is FT-Transformer, a simple adaptation of the Transformer architecture for the tabular domain. Both models are trained on the same underlying data set, with the only difference being the representation of the data. This multi-modal approach allows us to leverage the unique capabilities of each model and potentially uncover insights that may not be apparent when using a single model alone. We compare our model with two famous ensemble-based models, Random Forest and Extreme Gradient Boosting.

앙상블 딥러닝을 이용한 초음파 영상의 간병변증 분류 알고리즘 (Classification Algorithm for Liver Lesions of Ultrasound Images using Ensemble Deep Learning)

  • 조영복
    • 한국인터넷방송통신학회논문지
    • /
    • 제20권4호
    • /
    • pp.101-106
    • /
    • 2020
  • 현재 의료 현장에서 초음파 진단은 과거 청진기와 같다고 할 수 있다. 그러나 초음파의 특성상 검사자의 숙련도에 따라 결과 예측이 불확실하다는 단점을 가진다. 따라서 본 논문에서는 이런 문제를 해결하기 위해 딥러닝 기술을 기반으로 초음파 검사 중 간병변 탐지의 정확도를 높이고자 한다. 제안 논문에서는 CNN 모델과 앙상블 모델을 이용해 병변 분류의 정확도 비교 실험하였다. 실험결과 CNN 모델에서 분류 정확도는 평균 82.33%에서 앙상블모델의 경우 평균 89.9%로 약 7% 높은 것을 확인하였다. 또한 앙상블 모델이 평균 ROC커브에서도 0.97로 CNN모델보다 약 0.4정도 높은 것을 확인하였다.