Browse > Article
http://dx.doi.org/10.7465/jkdi.2014.25.6.1283

Classification of large-scale data and data batch stream with forward stagewise algorithm  

Yoon, Young Joo (Department of Business Information Statistics, Daejeon University)
Publication Information
Journal of the Korean Data and Information Science Society / v.25, no.6, 2014 , pp. 1283-1291 More about this Journal
Abstract
In this paper, we propose forward stagewise algorithm when data are very large or coming in batches sequentially over time. In this situation, ordinary boosting algorithm for large scale data and data batch stream may be greedy and have worse performance with class noise situations. To overcome those and apply to large scale data or data batch stream, we modify the forward stagewise algorithm. This algorithm has better results for both large scale data and data batch stream with or without concept drift on simulated data and real data sets than boosting algorithms.
Keywords
Concept drift; data stream; ensemble method; forward stagewise algorithm; large scale data;
Citations & Related Records
Times Cited By KSCI : 2  (Citation Analysis)
연도 인용수 순위
1 Bache, K. and Lichman, M. (2013). UCI machine learning repository [http://archive.ics.uci.edu/ml]. University of California, School of Information and Computer Science, Irvine, CA.
2 Dietterich, T. G. (2000). An experimental comparison of three methods for constructing ensembles decision trees: bagging, boosting and randomization. Machine Learning, 40, 139-157.   DOI   ScienceOn
3 Breiman, L. (1996). Bagging predictors. Machine Learning, 24, 123-140.
4 Breiman, L. (1998). Arcing classifiers (with discussion). Annals of Statistics, 26, 801-849.   DOI   ScienceOn
5 Breiman, L., Friedman, J., Olshen, R. and Stone, C. (1984). Classification and regression trees, Chapman and Hall, New York, NY.
6 Freund, Y. and Schapire, R. E. (1997). A decision-theoretic generalization of online learning and application to boosting. Journal of Computer and System Science, 55, 119-139.   DOI   ScienceOn
7 Hastie, T., Tibshirani, R. and Friedman, J. (2001). The elements of statistical learning, Springer-Verlag, New York, NY.
8 Kim, S. H., Cho, D. H. and Seok, K. H. (2012). Study on the ensemble methods with kernel ridge regression. Journal of the Korean Data & Information Science Society, 23, 375-383.   과학기술학회마을   DOI   ScienceOn
9 Kohavi, R. (1996). Scaling up the accuracy of naive-bayes classifiers: A decision-tree hybrid. Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, 202-207.
10 Kuncheva, L. I. (2004). Classification ensemble for changing environments. Proceedings of 5th International Workshop on Multiple Classifier systems, 1-15.
11 Street, W. N. and Kim, Y. S. (2001). A streaming ensemble algorithm (SEA) for large scale classification. Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 377-382.
12 Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society B, 58, 267-288.
13 Wang, H., Fan, W., Yu, P. S. and Han, J. (2003). Mining concept drifting data streams using ensemble classifiers. Proceedings of then 9th ACM SIGKDD International Conference on Knowledge discovery and Data Mining, 226-235.
14 Yoon, Y. J. (2010). Boosting algorithms for large-scale data and data batch stream (in Korean). The Korean Journal of Applied Statistics, 23, 197-206.   과학기술학회마을   DOI   ScienceOn
15 Quinlan, J. R. (1993). C4.5 : programs for machine learning, Morgan Kaufmann, San Maeto, CA.