Proceedings of the Korean Statistical Society Conference (한국통계학회:학술대회논문집)
- 2005.11a
- /
- Pages.193-197
- /
- 2005
Split Effect in Ensemble
- Chung, Dong-Jun (Department of Applied Statistics, Yonsei University) ;
- Kim, Hyun-Joong (Department of Applied Statistics, Yonsei University)
- Published : 2005.11.04
Abstract
Classification tree is one of the most suitable base learners for ensemble. For past decade, it was found that bagging gives the most accurate prediction when used with unpruned tree and boosting with stump. Researchers have tried to understand the relationship between the size of trees and the accuracy of ensemble. With experiment, it is found that large trees make boosting overfit the dataset and stumps help avoid it. It means that the accuracy of each classifier needs to be sacrificed for better weighting at each iteration. Hence, split effect in boosting can be explained with the trade-off between the accuracy of each classifier and better weighting on the misclassified points. In bagging, combining larger trees give more accurate prediction because bagging does not have such trade-off, thus it is advisable to make each classifier as accurate as possible.