• 제목/요약/키워드: Bagging ensemble

검색결과 85건 처리시간 0.023초

랜덤화 배깅을 이용한 재무 부실화 예측 (Randomized Bagging for Bankruptcy Prediction)

  • 민성환
    • 한국IT서비스학회지
    • /
    • 제15권1호
    • /
    • pp.153-166
    • /
    • 2016
  • Ensemble classification is an approach that combines individually trained classifiers in order to improve prediction accuracy over individual classifiers. Ensemble techniques have been shown to be very effective in improving the generalization ability of the classifier. But base classifiers need to be as accurate and diverse as possible in order to enhance the generalization abilities of an ensemble model. Bagging is one of the most popular ensemble methods. In bagging, the different training data subsets are randomly drawn with replacement from the original training dataset. Base classifiers are trained on the different bootstrap samples. In this study we proposed a new bagging variant ensemble model, Randomized Bagging (RBagging) for improving the standard bagging ensemble model. The proposed model was applied to the bankruptcy prediction problem using a real data set and the results were compared with those of the other models. The experimental results showed that the proposed model outperformed the standard bagging model.

분류 앙상블 모형에서 Lasso-bagging과 WAVE-bagging 가지치기 방법의 성능비교 (Comparison of ensemble pruning methods using Lasso-bagging and WAVE-bagging)

  • 곽승우;김현중
    • Journal of the Korean Data and Information Science Society
    • /
    • 제25권6호
    • /
    • pp.1371-1383
    • /
    • 2014
  • 분류 앙상블 모형이란 여러 분류기들의 예측 결과를 통합하여 더욱 정교한 예측성능을 가진 분류기를 만들기 위한 융합방법론이라 할 수 있다. 분류 앙상블을 구성하는 분류기들이 높은 예측 정확도를 가지고 있으면서 서로 상이한 모형으로 이루어져 있을 때 분류 앙상블 모형의 정확도가 높다고 알려져 있다. 하지만, 실제 분류 앙상블 모형에는 예측 정확도가 그다지 높지 않으며 서로 유사한 분류기도 포함되어 있기 마련이다. 따라서 분류 앙상블 모형을 구성하고 있는 여러 분류기들 중에서 서로 상이하면서도 정확도가 높은 것만을 선택하여 앙상블 모형을 구성해 보는 가지치기 방법을 생각할 수 있다. 본 연구에서는 Lasso 회귀분석 방법을 이용하여 분류기 중에 일부를 선택하여 모형을 만드는 방법과 가중 투표 앙상블 방법론의 하나인 WAVE-bagging을 이용하여 분류기 중 일부를 선택하는 앙상블 가지치기 방법을 비교하였다. 26개 자료에 대해 실험을 한 결과 WAVE-bagging 방법을 이용한 분류 앙상블 가지치기 방법이 Lasso-bagging을 이용한 방법보다 더 우수함을 보였다.

Double-Bagging Ensemble Using WAVE

  • Kim, Ahhyoun;Kim, Minji;Kim, Hyunjoong
    • Communications for Statistical Applications and Methods
    • /
    • 제21권5호
    • /
    • pp.411-422
    • /
    • 2014
  • A classification ensemble method aggregates different classifiers obtained from training data to classify new data points. Voting algorithms are typical tools to summarize the outputs of each classifier in an ensemble. WAVE, proposed by Kim et al. (2011), is a new weight-adjusted voting algorithm for ensembles of classifiers with an optimal weight vector. In this study, when constructing an ensemble, we applied the WAVE algorithm on the double-bagging method (Hothorn and Lausen, 2003) to observe if any significant improvement can be achieved on performance. The results showed that double-bagging using WAVE algorithm performs better than other ensemble methods that employ plurality voting. In addition, double-bagging with WAVE algorithm is comparable with the random forest ensemble method when the ensemble size is large.

부도 예측을 위한 앙상블 분류기 개발 (Developing an Ensemble Classifier for Bankruptcy Prediction)

  • 민성환
    • 한국산업정보학회논문지
    • /
    • 제17권7호
    • /
    • pp.139-148
    • /
    • 2012
  • 분류기의 앙상블 학습은 여러 개의 서로 다른 분류기들의 조합을 통해 만들어진다. 앙상블 학습은 기계학습 분야에서 많은 관심을 끌고 있는 중요한 연구주제이며 대부분의 경우에 있어서 앙상블 모형은 개별 기저 분류기보다 더 좋은 성과를 내는 것으로 알려져 있다. 본 연구는 부도 예측 모형의 성능개선에 관한 연구이다. 이를 위해 본 연구에서는 단일 모형으로 그 우수성을 인정받고 있는 SVM을 기저 분류기로 사용하는 앙상블 모형에 대해 고찰하였다. SVM 모형의 성능 개선을 위해 bagging과 random subspace 모형을 부도 예측 문제에 적용해 보았으며 bagging 모형과 random subspace 모형의 성과 개선을 위해 bagging과 random subspace의 통합 모형을 제안하였다. 제안한 모형의 성과를 검증하기 위해 실제 기업의 부도 예측 데이터를 사용하여 실험하였고, 실험 결과 본 연구에서 제안한 새로운 형태의 통합 모형이 가장 좋은 성과를 보임을 알 수 있었다.

유전자 알고리즘 기반 통합 앙상블 모형 (Genetic Algorithm based Hybrid Ensemble Model)

  • 민성환
    • Journal of Information Technology Applications and Management
    • /
    • 제23권1호
    • /
    • pp.45-59
    • /
    • 2016
  • An ensemble classifier is a method that combines output of multiple classifiers. It has been widely accepted that ensemble classifiers can improve the prediction accuracy. Recently, ensemble techniques have been successfully applied to the bankruptcy prediction. Bagging and random subspace are the most popular ensemble techniques. Bagging and random subspace have proved to be very effective in improving the generalization ability respectively. However, there are few studies which have focused on the integration of bagging and random subspace. In this study, we proposed a new hybrid ensemble model to integrate bagging and random subspace method using genetic algorithm for improving the performance of the model. The proposed model is applied to the bankruptcy prediction for Korean companies and compared with other models in this study. The experimental results showed that the proposed model performs better than the other models such as the single classifier, the original ensemble model and the simple hybrid model.

Improving Bagging Predictors

  • Kim, Hyun-Joong;Chung, Dong-Jun
    • 한국통계학회:학술대회논문집
    • /
    • 한국통계학회 2005년도 추계 학술발표회 논문집
    • /
    • pp.141-146
    • /
    • 2005
  • Ensemble method has been known as one of the most powerful classification tools that can improve prediction accuracy. Ensemble method also has been understood as ‘perturb and combine’ strategy. Many studies have tried to develop ensemble methods by improving perturbation. In this paper, we propose two new ensemble methods that improve combining, based on the idea of pattern matching. In the experiment with simulation data and with real dataset, the proposed ensemble methods peformed better than bagging. The proposed ensemble methods give the most accurate prediction when the pruned tree was used as the base learner.

  • PDF

Tree size determination for classification ensemble

  • Choi, Sung Hoon;Kim, Hyunjoong
    • Journal of the Korean Data and Information Science Society
    • /
    • 제27권1호
    • /
    • pp.255-264
    • /
    • 2016
  • Classification is a predictive modeling for a categorical target variable. Various classification ensemble methods, which predict with better accuracy by combining multiple classifiers, became a powerful machine learning and data mining paradigm. Well-known methodologies of classification ensemble are boosting, bagging and random forest. In this article, we assume that decision trees are used as classifiers in the ensemble. Further, we hypothesized that tree size affects classification accuracy. To study how the tree size in uences accuracy, we performed experiments using twenty-eight data sets. Then we compare the performances of ensemble algorithms; bagging, double-bagging, boosting and random forest, with different tree sizes in the experiment.

Extreme Learning Machine Ensemble Using Bagging for Facial Expression Recognition

  • Ghimire, Deepak;Lee, Joonwhoan
    • Journal of Information Processing Systems
    • /
    • 제10권3호
    • /
    • pp.443-458
    • /
    • 2014
  • An extreme learning machine (ELM) is a recently proposed learning algorithm for a single-layer feed forward neural network. In this paper we studied the ensemble of ELM by using a bagging algorithm for facial expression recognition (FER). Facial expression analysis is widely used in the behavior interpretation of emotions, for cognitive science, and social interactions. This paper presents a method for FER based on the histogram of orientation gradient (HOG) features using an ELM ensemble. First, the HOG features were extracted from the face image by dividing it into a number of small cells. A bagging algorithm was then used to construct many different bags of training data and each of them was trained by using separate ELMs. To recognize the expression of the input face image, HOG features were fed to each trained ELM and the results were combined by using a majority voting scheme. The ELM ensemble using bagging improves the generalized capability of the network significantly. The two available datasets (JAFFE and CK+) of facial expressions were used to evaluate the performance of the proposed classification system. Even the performance of individual ELM was smaller and the ELM ensemble using a bagging algorithm improved the recognition performance significantly.

A comparative assessment of bagging ensemble models for modeling concrete slump flow

  • Aydogmus, Hacer Yumurtaci;Erdal, Halil Ibrahim;Karakurt, Onur;Namli, Ersin;Turkan, Yusuf S.;Erdal, Hamit
    • Computers and Concrete
    • /
    • 제16권5호
    • /
    • pp.741-757
    • /
    • 2015
  • In the last decade, several modeling approaches have been proposed and applied to estimate the high-performance concrete (HPC) slump flow. While HPC is a highly complex material, modeling its behavior is a very difficult issue. Thus, the selection and application of proper modeling methods remain therefore a crucial task. Like many other applications, HPC slump flow prediction suffers from noise which negatively affects the prediction accuracy and increases the variance. In the recent years, ensemble learning methods have introduced to optimize the prediction accuracy and reduce the prediction error. This study investigates the potential usage of bagging (Bag), which is among the most popular ensemble learning methods, in building ensemble models. Four well-known artificial intelligence models (i.e., classification and regression trees CART, support vector machines SVM, multilayer perceptron MLP and radial basis function neural networks RBF) are deployed as base learner. As a result of this study, bagging ensemble models (i.e., Bag-SVM, Bag-RT, Bag-MLP and Bag-RBF) are found superior to their base learners (i.e., SVM, CART, MLP and RBF) and bagging could noticeable optimize prediction accuracy and reduce the prediction error of proposed predictive models.

지역 전문가의 앙상블 학습 (Ensemble learning of Regional Experts)

  • 이병우;양지훈;김선호
    • 한국정보과학회논문지:컴퓨팅의 실제 및 레터
    • /
    • 제15권2호
    • /
    • pp.135-139
    • /
    • 2009
  • 본 논문에서는 지역 전문가를 이용한 새로운 앙상블 방법을 제시하고자 한다. 이 앙상블 방법에서는 학습 데이타를 분할하여 속성 공간의 서로 다른 지역을 이용하여 전문가를 학습시킨다. 새로운 데이타를 분류할 때에는 그 데이타가 속한 지역을 담당하는 전문가들로 가중치 투표를 한다. UCI 기계 학습 데이타 저장소에 있는 10개의 데이타를 이용하여 단일 분류기, Bagging, Adaboost와 정확도를 비교하였다. 학습 알고리즘으로는 SVM, Naive Bayes, C4.5를 사용하였다. 그 결과 지역 전문가의 앙상블 학습 방법이 C4.5를 학습 알고리즘으로 사용한 Bagging, Adaboost와는 비슷한 성능을 보였으며 나머지 분류기보다는 좋은 성능을 보였다.