• 제목/요약/키워드: Ensemble machine learning

검색결과 234건 처리시간 0.021초

앙상블 러닝 기반 동적 가중치 할당 모델을 통한 보험금 예측 인공지능 연구 (Research on Insurance Claim Prediction Using Ensemble Learning-Based Dynamic Weighted Allocation Model)

  • 최종석
    • 한국정보전자통신기술학회논문지
    • /
    • 제17권4호
    • /
    • pp.221-228
    • /
    • 2024
  • 보험금 예측은 보험사의 리스크 관리와 재무 건전성 유지를 위한 핵심 과제 중 하나이다. 정확한 보험금 예측을 통해 보험사는 적정한 보험료를 책정하고, 예상 외의 손실을 줄이며, 고객 서비스의 질을 향상시킬 수 있다. 본 연구에서는 앙상블 러닝 기법을 적용하여 보험금 예측 모델의 성능을 향상시키고자 한다. 랜덤 포레스트(Random Forest), 그래디언트 부스팅 머신(Gradient Boosting Machine, GBM), XGBoost, Stacking, 그리고 제안한 동적 가중치 할당 모델(Dynamic Weighted Ensemble, DWE) 모델을 사용하여 예측 성능을 비교 분석하였다. 모델의 성능 평가는 평균 절대 오차(MAE), 평균 제곱근 오차(MSE), 결정 계수(R2) 등을 사용하여 수행되었다. 실험 결과, 동적 가중치 할당 모델이 평가 지표에서 가장 우수한 성능을 보였으며, 이는 랜덤 포레스트와 XGBoost, LR, LightGBM의 예측 결과를 결합하여 최적의 예측 성능을 도출한 결과이다. 본 연구는 앙상블 러닝 기법이 보험금 예측의 정확성을 높이는 데 효과적임을 입증하며, 보험업계에서 인공지능 기반 예측 모델의 활용 가능성을 제시한다.

머신러닝을 이용한 이러닝 학습자 집중도 평가 연구 (A Study on Evaluation of e-learners' Concentration by using Machine Learning)

  • 정영상;주민성;조남욱
    • 디지털산업정보학회논문지
    • /
    • 제18권4호
    • /
    • pp.67-75
    • /
    • 2022
  • Recently, e-learning has been attracting significant attention due to COVID-19. However, while e-learning has many advantages, it has disadvantages as well. One of the main disadvantages of e-learning is that it is difficult for teachers to continuously and systematically monitor learners. Although services such as personalized e-learning are provided to compensate for the shortcoming, systematic monitoring of learners' concentration is insufficient. This study suggests a method to evaluate the learner's concentration by applying machine learning techniques. In this study, emotion and gaze data were extracted from 184 videos of 92 participants. First, the learners' concentration was labeled by experts. Then, statistical-based status indicators were preprocessed from the data. Random Forests (RF), Support Vector Machines (SVMs), Multilayer Perceptron (MLP), and an ensemble model have been used in the experiment. Long Short-Term Memory (LSTM) has also been used for comparison. As a result, it was possible to predict e-learners' concentration with an accuracy of 90.54%. This study is expected to improve learners' immersion by providing a customized educational curriculum according to the learner's concentration level.

Predicting numeric ratings for Google apps using text features and ensemble learning

  • Umer, Muhammad;Ashraf, Imran;Mehmood, Arif;Ullah, Saleem;Choi, Gyu Sang
    • ETRI Journal
    • /
    • 제43권1호
    • /
    • pp.95-108
    • /
    • 2021
  • Application (app) ratings are feedback provided voluntarily by users and serve as important evaluation criteria for apps. However, these ratings can often be biased owing to insufficient or missing votes. Additionally, significant differences have been observed between numeric ratings and user reviews. This study aims to predict the numeric ratings of Google apps using machine learning classifiers. It exploits numeric app ratings provided by users as training data and returns authentic mobile app ratings by analyzing user reviews. An ensemble learning model is proposed for this purpose that considers term frequency/inverse document frequency (TF/IDF) features. Three TF/IDF features, including unigrams, bigrams, and trigrams, were used. The dataset was scraped from the Google Play store, extracting data from 14 different app categories. Biased and unbiased user ratings were discriminated using TextBlob analysis to formulate the ground truth, from which the classifier prediction accuracy was then evaluated. The results demonstrate the high potential for machine learning-based classifiers to predict authentic numeric ratings based on actual user reviews.

Estimation of lightweight aggregate concrete characteristics using a novel stacking ensemble approach

  • Kaloop, Mosbeh R.;Bardhan, Abidhan;Hu, Jong Wan;Abd-Elrahman, Mohamed
    • Advances in nano research
    • /
    • 제13권5호
    • /
    • pp.499-512
    • /
    • 2022
  • This study investigates the efficiency of ensemble machine learning for predicting the lightweight-aggregate concrete (LWC) characteristics. A stacking ensemble (STEN) approach was proposed to estimate the dry density (DD) and 28 days compressive strength (Fc-28) of LWC using two meta-models called random forest regressor (RFR) and extra tree regressor (ETR), and two novel ensemble models called STEN-RFR and STEN-ETR, were constructed. Four standalone machine learning models including artificial neural network, gradient boosting regression, K neighbor regression, and support vector regression were used to compare the performance of the proposed models. For this purpose, a sum of 140 LWC mixtures with 21 influencing parameters for producing LWC with a density less than 1000 kg/m3, were used. Based on the experimental results with multiple performance criteria, it can be concluded that the proposed STEN-ETR model can be used to estimate the DD and Fc-28 of LWC. Moreover, the STEN-ETR approach was found to be a significant technique in prediction DD and Fc-28 of LWC with minimal prediction error. In the validation phase, the accuracy of the proposed STEN-ETR model in predicting DD and Fc-28 was found to be 96.79% and 81.50%, respectively. In addition, the significance of cement, water-cement ratio, silica fume, and aggregate with expanded glass variables is efficient in modeling DD and Fc-28 of LWC.

Predicting movie audience with stacked generalization by combining machine learning algorithms

  • Park, Junghoon;Lim, Changwon
    • Communications for Statistical Applications and Methods
    • /
    • 제28권3호
    • /
    • pp.217-232
    • /
    • 2021
  • The Korea film industry has matured and the number of movie-watching per capita has reached the highest level in the world. Since then, movie industry growth rate is decreasing and even the total sales of movies per year slightly decreased in 2018. The number of moviegoers is the first factor of sales in movie industry and also an important factor influencing additional sales. Thus it is important to predict the number of movie audiences. In this study, we predict the cumulative number of audiences of films using stacking, an ensemble method. Stacking is a kind of ensemble method that combines all the algorithms used in the prediction. We use box office data from Korea Film Council and web comment data from Daum Movie (www.movie.daum.net). This paper describes the process of collecting and preprocessing of explanatory variables and explains regression models used in stacking. Final stacking model outperforms in the prediction of test set in terms of RMSE.

Performance Improvement of Classifier by Combining Disjunctive Normal Form features

  • Min, Hyeon-Gyu;Kang, Dong-Joong
    • International Journal of Internet, Broadcasting and Communication
    • /
    • 제10권4호
    • /
    • pp.50-64
    • /
    • 2018
  • This paper describes a visual object detection approach utilizing ensemble based machine learning. Object detection methods employing 1D features have the benefit of fast calculation speed. However, for real image with complex background, detection accuracy and performance are degraded. In this paper, we propose an ensemble learning algorithm that combines a 1D feature classifier and 2D DNF (Disjunctive Normal Form) classifier to improve the object detection performance in a single input image. Also, to improve the computing efficiency and accuracy, we propose a feature selecting method to reduce the computing time and ensemble algorithm by combining the 1D features and 2D DNF features. In the verification experiments, we selected the Haar-like feature as the 1D image descriptor, and demonstrated the performance of the algorithm on a few datasets such as face and vehicle.

앙상블 모델 기반의 기계 고장 예측 방법 (An Ensemble Model for Machine Failure Prediction)

  • 천강민;양재경
    • 산업경영시스템학회지
    • /
    • 제43권1호
    • /
    • pp.123-131
    • /
    • 2020
  • There have been a lot of studies in the past for the method of predicting the failure of a machine, and recently, a lot of researches and applications have been generated to diagnose the physical condition of the machine and the parts and to calculate the remaining life through various methods. Survival models are also used to predict plant failures based on past anomaly cycles. In particular, special machine that reflect the fluid flow and process characteristics of chemical plants are connected to hundreds or thousands of sensors, so there are not many factors that need to be considered, such as process and material data as well as application of derivative variables. In this paper, the data were preprocessed through time series anomaly detection based on unsupervised learning to predict the abnormalities of these special machine. Next, clustering results reflecting clustering-based data characteristics were applied to produce additional variables, and a learning data set was created based on the history of past facility abnormalities. Finally, the prediction methodology based on the supervised learning algorithm was applied, and the model update was confirmed to improve the accuracy of the prediction of facility failure. Through this, it is expected to improve the efficiency of facility operation by flexibly replacing the maintenance time and parts supply and demand by predicting abnormalities of machine and extracting key factors.

영작문 자동채점 시스템 개발에서 학습데이터 부족 문제 해결을 위한 앙상블 기법 적용의 효과 (Effect of Application of Ensemble Method on Machine Learning with Insufficient Training Set in Developing Automated English Essay Scoring System)

  • 이경호;이공주
    • 정보과학회 논문지
    • /
    • 제42권9호
    • /
    • pp.1124-1132
    • /
    • 2015
  • 일반적으로, 교사 학습 알고리즘이 적절히 학습되기 위해서는 레이블의 편향이 없는 충분한 양의 학습데이터가 필요하다. 그러나 영작문 자동채점 시스템 개발을 위한 충분하고 편향되지 않은 학습데이터를 수집하는 것은 어려운 일이다. 또한 영어 작문 평가의 경우, 전체적인 답안 수준에 대한 다면적인 평가가 이루어진다. 적고 편향되기 쉬운 학습데이터와 이를 이용한 여러 평가영역에 대한 학습모델을 생성해야하기 때문에, 이를 위한 적절한 기계학습 알고리즘을 결정하기 어렵다. 본 논문에서는 이러한 문제를 앙상블학습을 통해 완화할 수 있음을 실험에 통해 보이고자 한다. 실제 중, 고등학교 학생들을 대상으로 시행된 단문형 영작문 채점 결과를 학습데이터 개수와 편향성을 조절하여 실험하였다. 학습데이터의 개수 변화와 편향성 변화의 실험 결과, 에이다부스트 알고리즘을 적용한 결과를 투표로 결합한 앙상블 기법이 다른 알고리즘들 보다 전반적으로 더 나은 성능을 나타냄을 실험을 통해 나타내었다.

유전자 알고리즘을 이용한 분류자 앙상블의 최적 선택 (Optimal Selection of Classifier Ensemble Using Genetic Algorithms)

  • 김명종
    • 지능정보연구
    • /
    • 제16권4호
    • /
    • pp.99-112
    • /
    • 2010
  • 앙상블 학습은 분류 및 예측 알고리즘의 성과개선을 위하여 제안된 기계학습 기법이다. 그러나 앙상블 학습은 기저 분류자의 다양성이 부족한 경우 다중공선성 문제로 인하여 성과개선 효과가 미약하고 심지어는 성과가 악화될 수 있다는 문제점이 제기되었다. 본 연구에서는 기저 분류자의 다양성을 확보하고 앙상블 학습의 성과개선 효과를 제고하기 위하여 유전자 알고리즘 기반의 범위 최적화 기법을 제안하고자 한다. 본 연구에서 제안된 최적화 기법을 기업 부실예측 인공신경망 앙상블에 적용한 결과 기저 분류자의 다양성이 확보되고 인공신경망 앙상블의 성과가 유의적으로 개선되었음을 보여주었다.

Recent deep learning methods for tabular data

  • Yejin Hwang;Jongwoo Song
    • Communications for Statistical Applications and Methods
    • /
    • 제30권2호
    • /
    • pp.215-226
    • /
    • 2023
  • Deep learning has made great strides in the field of unstructured data such as text, images, and audio. However, in the case of tabular data analysis, machine learning algorithms such as ensemble methods are still better than deep learning. To keep up with the performance of machine learning algorithms with good predictive power, several deep learning methods for tabular data have been proposed recently. In this paper, we review the latest deep learning models for tabular data and compare the performances of these models using several datasets. In addition, we also compare the latest boosting methods to these deep learning methods and suggest the guidelines to the users, who analyze tabular datasets. In regression, machine learning methods are better than deep learning methods. But for the classification problems, deep learning methods perform better than the machine learning methods in some cases.