DOI QR코드

DOI QR Code

Support Vector Machine based on Stratified Sampling

  • Jun, Sung-Hae (Department of Bioinformatics & Statistics, Cheongju University)
  • Received : 2008.12.01
  • Accepted : 2009.06.08
  • Published : 2009.06.30

Abstract

Support vector machine is a classification algorithm based on statistical learning theory. It has shown many results with good performances in the data mining fields. But there are some problems in the algorithm. One of the problems is its heavy computing cost. So we have been difficult to use the support vector machine in the dynamic and online systems. To overcome this problem we propose to use stratified sampling of statistical sampling theory. The usage of stratified sampling supports to reduce the size of training data. In our paper, though the size of data is small, the performance accuracy is maintained. We verify our improved performance by experimental results using data sets from UCI machine learning repository.

Keywords

References

  1. P. Ciudici, Applied Data Mining, Wiley, 2003
  2. J. Han, M. Kamber, Data Mining: Concepts and Techniques, Morgan Kaufmann Publishers, 2001
  3. T. Hastie, R. Tibshirani, J. Friedman, The Elements of Statistical Learning: Data mining, Inference, and Prediction, Springer, 2001
  4. S.R. Gunn, "Support Vector Machines for Classification and Regression", Technical Report, University of Southampton, 1998
  5. V. Cherkassky, F. Mulier, Learning From Data Concepts, Theory, and Methods, John Wiley & Sons, 1998
  6. J. H. Friedman, "An Overview of Predictive Learning and Function Approximation," From Statistics to Neural Networks: Theory and Pattern Recognition Applications, vol. 136, Springer, 1994
  7. Y.S. Jia, C.Y. Jia, and H.W. Qi, "A New Nu-Support Vector Machine for Training Sets with Duplicate Samples," Proceedings of the Fourth International Conference on Machine Learning and Cybernetics, pp. 4370-4373, 2005
  8. C. Gold, P. Sollich, "Model selection for support vector machine classification", Neurocomputing, vol. 55(1-2), pp. 221-249, 2003 https://doi.org/10.1016/S0925-2312(03)00375-8
  9. S. Haykin, Neural Networks, Prentice Hall, 1999
  10. C. Nello, S.-H. John, An Introduction to Support Vector Machines and other kernel-based learning methods, Cambridge University Press, 2000
  11. V. N. Vapnik, The Nature of Statistical Learning Theory, Springer, 1995
  12. V.N. Vapnik, Statistical Learning Theory, John Wiley & Sons, 1998
  13. V.N. Vapnik, "An Overview of Statistical Learning Theory," IEEE Transactions on Neural Networks, vol. 10, no. 5, pp. 988-999, 1999 https://doi.org/10.1109/72.788640
  14. J. Wang, X. Wu, and C. Zhang, "Support vector machines based on K-means clustering for real-time business intelligent systems," Int. J. Business Intelligence and Data Mining, vol. 1, no. 1, pp. 54-64, 2005 https://doi.org/10.1504/IJBIDM.2005.007318
  15. A. Ben-Hur, D. Horn, H.T. Siegelmann, and V.N. Vapnik, "Support Vector Clustering", Journal of Machine Learning Research, vol. 2, pp. 125-137, 2001 https://doi.org/10.1162/15324430260185565
  16. S.-H. Jun, "Web Usage Mining Using Evolutionary Support Vector Machine", Lecture Note in Artificial Intelligence(LNAI, AI'2005), vol. 3809, pp. 1015-1020, Springer-Verlag, 2005
  17. L. Xuchun, Z. Yan, and E. Sung, "Sequential bootstrapped support vector machines-a SVM accelerator," Proceedings of IEEE International Joint Conference on Neural Networks, vol. 3, pp. 1437-1442, 2005 https://doi.org/10.1109/IJCNN.2005.1556086
  18. F. Friedrichs, C. Igel, "Evolutionary Tuning of Multiple SVM Parameters", Proceedings of the 12th European Symposium on Artificial Neural Networks, 2004
  19. P. Ling, Y. Wang, N. Lu, J. Y. Wang, S. Liang, C. G. Zhou, "Two-Phase Support Vector Clustering for Multi-Relational Data Mining", Proceedings of the International Conference on Cyberworlds, 2005
  20. Sd F. Vilarino, P. Spyridonos, J. Vitria, P. Radeva, "Experiments with SVM and Stratified Sampling with an Imbalanced Problem: Detection of Intestinal Contractions," LNCS, vol. 3687, pp. 783-792, 2005
  21. UCI Machine Learning Repository, http://archive.ics.uci.edu/ml/
  22. S.K. Thompson, Sampling, 2nd ed., John Wiley & Sons, 2002, pp. 117-127
  23. R. L. Scheaffer, W. Mendenhall III, R. Lyman Ott, Elementary Survey Sampling, Fifth Edition, Duxbury Press, 1996
  24. C.S. Ding, Q. Wu, C.T. Hsieh, and M. Pedram, "Stratified Random Sampling for Power Estimation," IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 17, no. 6, pp. 465-471, 1998 https://doi.org/10.1109/43.703828
  25. P.A.D.I. Santos, Jr., R.J. Burke, and J.M. Tien, "Prograssive Random Sampling With Stratification," IEEE Transactions on Systems, Man, and Cybernetics-Part A:Applications and Reviews, vol. 37, no. 6, pp. 1223-1230, 2007 https://doi.org/10.1109/TSMCC.2007.905818
  26. M. Xing, M. Jaeger, and H. Baogang, "An Effective Stratified Sampling Scheme for Environment Maps with Median Cut Method," Proceedings of International Conference on Computer Graphics, Imaging and Visualisation, pp. 384-389, 2006
  27. M. Keramat, and R. Kielbasa, "A study of stratified sampling in variance reduction techniques for parametric yield estimation," Proceedings of IEEE International Symposium on Circuits and Systems, vol. 3, pp. 1652-1655, 1997
  28. The R Project for Statistical Computing, http://www.rproject.org

Cited by

  1. A Cost Sensitive Part-of-Speech Tagging: Differentiating Serious Errors from Minor Errors vol.12, pp.1, 2012, https://doi.org/10.5391/IJFIS.2012.12.1.6