Browse > Article
http://dx.doi.org/10.3837/tiis.2014.10.005

SVM-Based Incremental Learning Algorithm for Large-Scale Data Stream in Cloud Computing  

Wang, Ning (School of Computer and Communication Engineering, University of Science and Technology Beijing)
Yang, Yang (School of Computer and Communication Engineering, University of Science and Technology Beijing)
Feng, Liyuan (School of Computer and Communication Engineering, University of Science and Technology Beijing)
Mi, Zhenqiang (School of Computer and Communication Engineering, University of Science and Technology Beijing)
Meng, Kun (Department of Computer Science and Technology, Tsinghua University)
Ji, Qing (School of Computer and Communication Engineering, University of Science and Technology Beijing)
Publication Information
KSII Transactions on Internet and Information Systems (TIIS) / v.8, no.10, 2014 , pp. 3378-3393 More about this Journal
Abstract
We have witnessed the rapid development of information technology in recent years. One of the key phenomena is the fast, near-exponential increase of data. Consequently, most of the traditional data classification methods fail to meet the dynamic and real-time demands of today's data processing and analyzing needs--especially for continuous data streams. This paper proposes an improved incremental learning algorithm for a large-scale data stream, which is based on SVM (Support Vector Machine) and is named DS-IILS. The DS-IILS takes the load condition of the entire system and the node performance into consideration to improve efficiency. The threshold of the distance to the optimal separating hyperplane is given in the DS-IILS algorithm. The samples of the history sample set and the incremental sample set that are within the scope of the threshold are all reserved. These reserved samples are treated as the training sample set. To design a more accurate classifier, the effects of the data volumes of the history sample set and the incremental sample set are handled by weighted processing. Finally, the algorithm is implemented in a cloud computing system and is applied to study user behaviors. The results of the experiment are provided and compared with other incremental learning algorithms. The results show that the DS-IILS can improve training efficiency and guarantee relatively high classification accuracy at the same time, which is consistent with the theoretical analysis.
Keywords
Data stream; cloud computing; SVM; incremental learning;
Citations & Related Records
Times Cited By KSCI : 1  (Citation Analysis)
연도 인용수 순위
1 N. A. Syed, S. Huan, L. Kah and K. Sung, "Incremental learning with support vector machines," in Proc. of the Workshop on Support Vector Machines at the international Joint Conference on Artificial Intelligence (IJCAI 1999), pp.352-356, July 31-August 6, 1999. http://citeseer.ist.psu.edu/viewdoc/download?doi=10.1.1.46.6367&rep=rep1&type=pdf.
2 G. Ditzler and R. Polikar, "Incremental learning of concept drift from streaming imbalanced data," IEEE Transactions on Knowledge and Data Engineering, vol. 25, no. 10, pp. 2283-2301, October, 2013. Article (CrossRef Link).   DOI
3 Y. B. Yuan, J. Lu and F. L. Cao, "Baby formula classification based on forth order polynomial smoothing support vector machine," in Proc. of International Conference on Machine Learning and Cybernetics, pp. 685-691, July 10-13, 2011. Article (CrossRef Link).
4 W. Ding, Y. Han, J. Wang and Z. Zhao, "Space reduction for extreme aggregation of data stream over time-based sliding window," in Proc. of 2012 IEEE 5th International Conference on Cloud Computing, pp. 1002-1003, June 24-29, 2012. Article (CrossRef Link).
5 A. Ghazikhani, H. S. Yazdi and R.Monsefi, "Class imbalance handling using wrapper-based random oversampling," in Proc. of 2012 20th Iranian Conference on Electrical Engineering (ICEE 2012), pp. 611-616, May 15-17, 2012. Article (CrossRef Link).
6 D. M. Farid , L. Zhang, A. Hossain, C. M. Rahman, R. Strachan, G. Sexton and K. Dahal, "An adaptive ensemble classifier for mining concept drifting data streams," Expert Systems with Applications, vol. 40, no. 15, pp. 5895-5906, November, 2013. Article (CrossRef Link).   DOI
7 L. Zhu, S. Pang, G. Chen and A. Sarrafzadeh, "Class imbalance robust incremental LPSVM for data streams learning," in Proc. of In Neural Networks (IJCNN), The 2012 International Joint Conference on, pp. 1-8, June 10-15, 2012. Article (CrossRef Link).
8 B. Fergani and L. Clavier, "Importance-weighted the imbalanced data for C-SVM classifier to human activity recognition," in Proc. of In Systems, Signal Processing and their Applications (WoSSPA), 2013 8th International Workshop on, pp. 330-335, May12-15, 2013. Article (CrossRef Link).
9 C. C. Chang and C. J. Lin, "LIBSVM: a library for support vector machines," ACM Transactions on Intelligent Systems and Technology, vol. 2, no. 3, pp. 1-27, April, 2011. Article (CrossRef Link).
10 S. Hashemi and Y. Yang, "Flexible decision tree for data stream classification in the presence of concept change, noise and missing values," Data Mining and Knowledge Discovery, vol. 19, no. 1, pp: 95-131, August, 2009. Article (CrossRef Link).   DOI
11 S. Pang, L. Zhu, G. Chen, A. Sarrafzadeh, T. Ban and D. Inoue, "Dynamic class imbalance learning for incremental LPSVM," Neural Networks, vol. 44, pp.87-100, August, 2013. Article (CrossRef Link).   DOI
12 N. P. Couellan and T. B. Trafalis, "On-line SVM learning via an incremental primal-dual technique," Optimization Method Software, vol. 28, no. 2, pp. 256-275, February, 2013. Article (CrossRef Link).   DOI
13 J. Zheng, F. Shen, H. Fan and J. Zhao, "An online incremental learning support vector machine for large-scale data," Neural Computing Applications, vol. 22, no. 5, pp. 1023-1035, May, 2013. Article (CrossRef Link).   DOI
14 W. Hou, B. Yang, C. Wuc and Z. Zhou, "RedTrees: a relational decision tree algorithm in streams," Expert Systems with Applications, vol. 37, no. 9, pp. 6265-6269, September, 2010. Article (CrossRef Link).   DOI
15 S. Moro, P. Cortez and P. Rita. "A Data-Driven Approach to Predict the Success of Bank Telemarketing," Decision Support Systems, vol. 62, pp: 22-31, June, 2014. Article (CrossRef Link).   DOI
16 M. T. Cazzolato, M. X. Ribeiro, C. Yaguinuma and M. T. P.Santos, "A statistical decision tree algorithm for data stream classification," in Proc. of the 15th International Conference on Enterprise Information Systems, pp. 217-223, July 4-July 7, 2013. Article (CrossRef Link).
17 Z. S. Abdallah, M. M. Gaber, B. Srinivasan and S. Krishnaswamy, "StreamAR: incremental and active learning with evolving sensory data for activity recognition," in Proc. of In Tools with Artificial Intelligence (ICTAI), 2012 IEEE 24th International Conference on, pp. 1163-1170, November 7-9, 2012. Article (CrossRef Link).
18 N. Sun and Y. Guo, "Model on data stream classification with incremental learning," Computer Engineering and Design, vol. 33, no. 11, pp. 4225-4229, November, 2012. Article (CrossRef Link).
19 W. Wu and L. Gruenwald, "Research issues in mining multiple data streams," in Proc. of the First International Workshop on Novel Data Stream Pattern Mining Techniques,ACM, pp.56-60, July 25, 2010. Article (CrossRef Link).
20 Y. Tang, Y. Q. Zhang, N. V. Chawla and S. Krasser, "SVMs modeling for highly imbalanced classification," Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE Transactions on, vol. 39, no. 1, pp. 281-288, February, 2009. Article (CrossRef Link).   DOI
21 M. Gonzalez-Mendoza and R.E.I. Orozco, "Quadratic optimization fine tuning for the support vector machines learning phase," Expert Systems with Applications, vol. 41, no. 3, pp. 886-892. January, 2014. Article (CrossRef Link).   DOI
22 B. V. Mercy, "Classification and dynamic class detection of real time data for tsunami warning system," in Proc. of In Recent Advances in Computing and Software Systems (RACSS), 2012 International Conference on, pp. 124-129, April 25-27, 2012. Article (CrossRef Link).
23 J. Yang, Y. Jiao, N. Xiong and D. Park, "Fast face gender recognition by using local ternary pattern and extreme learning machine," KSII Transactions on Internet and Information Systems, vol.7, no.7, pp.1705-1720, July, 2013. Article (CrossRef Link).   과학기술학회마을   DOI   ScienceOn
24 W. Ding and A. Xue, "Fast incremental learning SVM for Web text classification," Application Research of Computers, vol. 29, no. 4, pp. 1275-1278, April 2012. Article (CrossRef Link).
25 S. Zheng, C. Yang, E. A. Hendriks and X. Wang, "Adaptive weighted least squares SVM based snowing model for image denoising," International Journal of Wavelets Multiresolution and Information Processing, vol. 11, no. 6, pp. 1-25, November, 2013. Article (CrossRef Link).
26 X. Xing, K. Ji, H. Zou and J. Sun, "Feature selection and weighted SVM classifier-based ship detection in PolSAR imagery," International Journal of Remote Sensing, vol. 34, no. 22, pp. 7925-7944, September, 2013. Article (CrossRef Link).   DOI