DOI QR코드

DOI QR Code

COMPARATIVE STUDY OF THE PERFORMANCE OF SUPPORT VECTOR MACHINES WITH VARIOUS KERNELS

  • Nam, Seong-Uk (Department of Mathematics, Pusan National University) ;
  • Kim, Sangil (Department of Mathematics, Pusan National University) ;
  • Kim, HyunMin (Department of Mathematics, Pusan National University) ;
  • Yu, YongBin (Department of Machine Learning Engineering,Department of Machine Learning Engineering, Silex)
  • Received : 2021.05.03
  • Accepted : 2021.05.31
  • Published : 2021.05.31

Abstract

A support vector machine (SVM) is a state-of-the-art machine learning model rooted in structural risk minimization. SVM is underestimated with regards to its application to real world problems because of the difficulties associated with its use. We aim at showing that the performance of SVM highly depends on which kernel function to use. To achieve these, after providing a summary of support vector machines and kernel function, we constructed experiments with various benchmark datasets to compare the performance of various kernel functions. For evaluating the performance of SVM, the F1-score and its Standard Deviation with 10-cross validation was used. Furthermore, we used taylor diagrams to reveal the difference between kernels. Finally, we provided Python codes for all our experiments to enable re-implementation of the experiments.

Keywords

Acknowledgement

This research was supported by PNU-RENovation(2018-2019).

References

  1. Boser, Bernhard E and Guyon, Isabelle M and Vapnik, Vladimir N, A training algorithm for optimal margin classifiers, Proceedings of the fifth annual workshop on Computational learning theory, ACM (1992), 144-152.
  2. Smola, A tutorial on support vector regression, Statistics and computing 14, Springer (2004), no.3 199-222. https://doi.org/10.1023/B:STCO.0000035301.49549.88
  3. Burges, Christopher JC, A tutorial on support vector machines for pattern recognition, Data mining and knowledge discovery 2, Springer (1998), no.2 121-167. https://doi.org/10.1023/A:1009715923555
  4. Guyon, I, Svm application list, URL http://www.clopinet.com/isabelle/Projects/SVM/applist.html, (1999).
  5. Wang, Guosheng, A survey on training algorithms for support vector machine classifiers, Fourth International Conference on Networked Computing and Advanced Information Management 1, IEEE (2008), 123-128.
  6. Souza, Cesar R, Kernel functions for machine learning applications, Creative Commons Attribution-Noncommercial-Share Alike 3, (2010), 29.
  7. Shawe-Taylor, John and Sun, Shiliang, A review of optimization methodologies in support vector machines, Neurocomputing 74, Elsevier (2011), no.17 3609-3618. https://doi.org/10.1016/j.neucom.2011.06.026
  8. Nayak, Janmenjoy and Naik, Bighnaraj and Behera, H, A comprehensive survey on support vector machine in data mining tasks: applications & challenges, International Journal of Database Theory and Application 8, (2015), no.1 169-186. https://doi.org/10.14257/ijdta.2015.8.1.18
  9. Campbell, Colin, Kernel methods: a survey of current techniques, Neurocomputing 48, Elsevier (2002), no.1-4 63-84. https://doi.org/10.1016/S0925-2312(01)00643-9
  10. Smola, Alex J and Scholkopf, Bernhard, Learning with kernels, 4, Citeseer (1998).
  11. Vedaldi, Andrea and Zisserman, Andrew, Efficient additive kernels via explicit feature maps, IEEE transactions on pattern analysis and machine intelligence 34, IEEE (2012), no.3 480-492. https://doi.org/10.1109/TPAMI.2011.153
  12. Rahimi, Ali and Recht, Benjamin, Random features for large-scale kernel machines, Advances in neural information processing systems, (2008), 1177-1184.
  13. Mangasarian, Olvi L and Wolberg, William H, Cancer diagnosis via linear programming, University of Wisconsin-Madison Department of Computer Sciences, (1990).
  14. Dua, Dheeru and Graff, Casey, UCI Machine Learning Repository, University of California, Irvine, School of Information and Computer Sciences, (2017).
  15. Silva, Pedro FB and Marcal, Andre RS and da Silva, Rubim M Almeida, Evaluation of features for leaf discrimination, International Conference Image Analysis and Recognition, Springer (2013), 197-204.
  16. Redmond, Michael and Baveja, Alok, A data-driven software tool for enabling cooperative information sharing among police departments, European Journal of Operational Research 141, Elsevier (2002), no.3 660-678. https://doi.org/10.1016/S0377-2217(01)00264-8
  17. Brooks, Thomas F and Pope, D Stuart and Marcolini, Michael A, Airfoil self-noise and prediction, (1989).
  18. Cortez, Paulo and Morais, Anibal de Jesus Raimundo, A data mining approach to predict forest fires using meteorological data, Data mining and knowledge discovery, APPIA (2007).
  19. Cassotti, M and Ballabio, D and Todeschini, R and Consonni, V, A similarity-based QSAR model for predicting acute toxicity towards the fathead minnow (Pimephales promelas), SAR and QSAR in Environmental Research 26, Taylor & Francis (2015), no.3 217-243. https://doi.org/10.1080/1062936X.2015.1018938
  20. Nakai, Kenta and Kanehisa, Minoru, Expert system for predicting protein localization sites in gram-negative bacteria, Proteins: Structure, Function, and Bioinformatics 11, Springer (1991), no.2 95-110. https://doi.org/10.1002/prot.340110203
  21. Breiman, Leo, Classification and regression trees, Routledge (2017).
  22. Cortez, Paulo and Cerdeira, Antonio and Almeida, Fernando and Matos, Telmo and Reis, Jose, Modeling wine preferences by data mining from physicochemical properties, Decision Support Systems 47, Elsevier (1998), no.4 547-553. https://doi.org/10.1016/j.dss.2009.05.016
  23. Taylor, Karl E, Summarizing multiple aspects of model performance in a single diagram, Data mining and knowledge discovery 106, Journal of Geophysical Research: Atmospheres (1998), no.D7 7183-7192. https://doi.org/10.1029/2000JD900719
  24. Pedregosa, Fabian and Varoquaux, Gael and Gramfort, Alexandre and Michel, Vincent and Thirion, Bertrand and Grisel, Olivier and Blondel, Mathieu and Prettenhofer, Peter and Weiss, Ron and Dubourg, Vincent and others, Scikit-learn: Machine learning in Python, Journal of machine learning research 12, (2011), 2825-2830
  25. McKinney, Wes and others, Data structures for statistical computing in python, Proceedings of the 9th Python in Science Conference 445, Austin, TX (2010), 51-56.
  26. Yannick Copin, taylor diagram python code, URL https://gist.github.com/ycopin/3342888, (2018).
  27. Saunders, Craig and Stitson, Mark O and Weston, Jason and Bottou, Leon and Smola, A and others, Support vector machine-reference manual, Technical Report, Department of Computer Science, Royal Holloway, University of London, Egham, UK, (1998).
  28. Valentini, Giorgio, Gene expression data analysis of human lymphoma using support vector machines and output coding ensembles, Artificial Intelligence in Medicine 26, Elsevier (2002), no.3 281-304. https://doi.org/10.1016/S0933-3657(02)00077-5
  29. Fadel, Sayed and Ghoniemy, Said and Abdallah, Mohamed and Sorra, Hussein Abu and Ashour, Amira and Ansary, Asif, Investigating the effect of different kernel functions on the performance of SVM for recognizing Arabic characters, International Journal of Advanced Computer Science and Applications 7, Citeseer (2016), no.1 446-450.
  30. Chen, Rung-Ching and Hsieh, Chung-Hsun, Web page classification based on a support vector machine using a weighted vote schema, Expert Systems with Applications 31, Elsevier (2006), no.2 427-435. https://doi.org/10.1016/j.eswa.2005.09.079
  31. Kar, Purushottam and Karnick, Harish, Random feature maps for dot product kernels, Artificial Intelligence and Statistics (2012), 583-591.
  32. Deng, Wan-Yu and Ong, Yew-Soon and Zheng, Qing-Hua, A fast reduced kernel extreme learning machine, Neural Networks 76, Elsevier (2016), 29-38. https://doi.org/10.1016/j.neunet.2015.10.006
  33. Wang, Benjamin X and Japkowicz, Nathalie, Boosting support vector machines for imbalanced data sets, Knowledge and information systems 25, Springer (2010), no.1 1-20. https://doi.org/10.1007/s10115-009-0198-y
  34. Caruana, Rich and Niculescu-Mizil, Alexandru, An empirical comparison of supervised learning algorithms, Proceedings of the 23rd international conference on Machine learning, ACM (2006), 161-168.
  35. Alashwal, Hany and Deris, Safaai and Othman, Razib M, A Bayesian kernel for the Prediction of Protein-Protein Interactions, World Academy of Science, Engineering and Technology 51, (2009), 928-933.
  36. BONITA, OLIVIA and MUFLIKHAH, LAILIL, Comparison of Gaussian and ANOVA Kernel in Support Vector Regression for Predicting Coal Price, 2018 International Conference on Sustainable Information Engineering and Technology (SIET), IEEE (2018), 147-150.
  37. Gish, Herbert, A probabilistic approach to the understanding and training of neural network classifiers, International Conference on Acoustics, Speech, and Signal Processing, IEEE (1990), 1361-1364.
  38. Zhang, Guoqiang Peter, Neural networks for classification: a survey, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 30, IEEE (2000), no.4 451-462. https://doi.org/10.1109/5326.897072
  39. Adya, Monica and Collopy, Fred, How effective are neural networks at forecasting and prediction? A review and evaluation, Journal of forecasting 17, Wiley Online Library (1998), no.5-6 481-495. https://doi.org/10.1002/(SICI)1099-131X(1998090)17:5/6<481::AID-FOR709>3.0.CO;2-Q
  40. Callen, Jeffrey L and Kwan, Clarence CY and Yip, Patrick CY and Yuan, Yufei, Neural network forecasting of quarterly accounting earnings, International Journal of Forecasting 12, Elsevier (1996), no.4 475-482. https://doi.org/10.1016/S0169-2070(96)00706-6
  41. Church, Keith B and Curram, Stephen P, Forecasting consumers' expenditure: A comparison between econometric and neural network models, International journal of forecasting 12, Elsevier (1996), no.2 255-267. https://doi.org/10.1016/0169-2070(95)00631-1
  42. Connor, Jerome T and Martin, R Douglas and Atlas, Les E, Recurrent neural networks and robust time series prediction, IEEE transactions on neural networks 5, IEEE (1994), no.2 240-254. https://doi.org/10.1109/72.279188
  43. Cottrell, Marie and Girard, Bernard and Girard, Yvonne and Mangeas, Morgan and Muller, Corinne, Neural modeling for time series: a statistical stepwise method for weight elimination, IEEE transactions on neural networks 6, IEEE (1995), no.6 1355-1364. https://doi.org/10.1109/72.471372
  44. Faraway, Julian and Chatfield, Chris, Time series forecasting with neural networks: a comparative study using the air line data, Journal of the Royal Statistical Society: Series C (Applied Statistics) 47, Wiley Online Library (1998), no.2 231-250.
  45. Fletcher, Desmond and Goss, Ernie, Forecasting with neural networks: an application using bankruptcy data, Information & Management 24, Elsevier (1993), no.3 159-167. https://doi.org/10.1016/0378-7206(93)90064-Z
  46. Gorr, Wilpen L, Research prospective on neural network forecasting, International Journal of Forecasting 10, Elsevier (1994), no.1 1-4. https://doi.org/10.1016/0169-2070(94)90044-2
  47. Hippert, Henrique Steinherz and Pedreira, Carlos Eduardo and Souza, Reinaldo Castro, Neural networks for short-term load forecasting: A review and evaluation, IEEE Transactions on power systems 16, IEEE (2001), no.1 44-55. https://doi.org/10.1109/59.910780
  48. Belli, MR and Conti, Massimo and Crippa, Paolo and Turchetti, Claudio, Artificial neural networks as approximators of stochastic processes, Neural Networks 12, Elsevier (1999), no.4-5 647-658. https://doi.org/10.1016/S0893-6080(99)00017-9
  49. Castro, Juan Luis and Mantas, Carlos Javier and Benitez, JM, Neural networks with a continuous squashing function in the output are universal approximators, Neural Networks 13, Elsevier (2000), no.6 561-563. https://doi.org/10.1016/S0893-6080(00)00031-9
  50. Funahashi, Ken-Ichi, On the approximate realization of continuous mappings by neural networks, Neural networks 2, Elsevier (1989), no.3 183-192. https://doi.org/10.1016/0893-6080(89)90003-8
  51. Andrews, Robert and Diederich, Joachim and Tickle, Alan B, Survey and critique of techniques for extracting rules from trained artificial neural networks, Knowledge-based systems 8, Elsevier (1995), no.6 373-389. https://doi.org/10.1016/0950-7051(96)81920-4
  52. Castro, Juan L and Mantas, Carlos J and Benitez, Jose Manuel, Interpretation of artificial neural networks by means of fuzzy rules, IEEE Transactions on Neural Networks 13, IEEE (2002), no.1 101-116. https://doi.org/10.1109/72.977279
  53. Setiono, Rudy and Leow, Wee Kheng and Zurada, Jacek M, Extraction of rules from artificial neural networks for nonlinear regression, IEEE transactions on neural networks 13, IEEE (2002), no.3 564-577. https://doi.org/10.1109/TNN.2002.1000125
  54. Setiono, Rudy and Thong, James YL, An approach to generate rules from neural networks for regression problems, European Journal of Operational Research 155, Elsevier (2004), no.1 239-250. https://doi.org/10.1016/S0377-2217(02)00792-0
  55. Lisboa, Paulo JG, A review of evidence of health benefit from artificial neural networks in medical intervention, Neural networks 15, Elsevier (2002), no.1 11-39. https://doi.org/10.1016/S0893-6080(01)00111-3
  56. Portney, Leslie Gross and Watkins, Mary P and others, Foundations of clinical research: applications to practice, Pearson/Prentice Hall Upper Saddle River, NJ 892, (2009).
  57. Shawe-Taylor, John and Bartlett, Peter L and Williamson, Robert C and Anthony, Martin, Structural risk minimization over data-dependent hierarchies, IEEE transactions on Information Theory 44, IEEE (1998), no.5 1926-1940. https://doi.org/10.1109/18.705570
  58. Vapnik, Vladimir, Estimation of dependences based on empirical data, Springer Science & Business Media, (2006).
  59. McCulloch, Warren S and Pitts, Walter, A logical calculus of the ideas immanent in nervous activity, The bulletin of mathematical biophysics 5, Springer (1943), no.4 115-133. https://doi.org/10.1007/BF02478259
  60. McClelland, James L and Rumelhart, David E and PDP Research Group and others, Parallel distributed processing, MIT press Cambridge, MA: 2, (1987).
  61. Dietterich, Thomas G, Ensemble methods in machine learning, International workshop on multiple classifier systems, Springer (2000), 1-15.
  62. Rokach, Lior and Maimon, Oded, Feature set decomposition for decision trees, Intelligent Data Analysis 9, IOS Press (1998), no.2 131-158. https://doi.org/10.3233/IDA-2005-9202
  63. Kuncheva, Ludmila I and Whitaker, Christopher J, Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy, Machine learning 51, Springer (2003), no.2 181-207. https://doi.org/10.1023/A:1022859003006
  64. Sollich, Peter and Krogh, Anders, Learning with ensembles: How overfitting can be useful, Advances in neural information processing systems, (1996), 190-196.
  65. Brown, Gavin and Wyatt, Jeremy and Harris, Rachel and Yao, Xin, Diversity creation methods: a survey and categorisation, Information Fusion 6, Elsevier (2005), no.1 5-20. https://doi.org/10.1016/j.inffus.2004.04.004
  66. Adeva, Juan Jose Garcia and Beresi, U and Calvo, R, Accuracy and diversity in ensembles of text categorisers, CLEI Electronic Journal 9, (2005), no.1 1-12.
  67. Krogh, Anders and Vedelsby, Jesper, Neural network ensembles, cross validation, and active learning, Advances in neural information processing systems, (1995) 231-238.
  68. Belkin, Mikhail and Hsu, Daniel and Ma, Siyuan and Mandal, Soumik, Reconciling modern machine-learning practice and the classical bias-variance trade-off, Proceedings of the National Academy of Sciences 116, National Acad Sciences (2019), no.32 15849- 15854. https://doi.org/10.1073/pnas.1903070116
  69. Nakkiran, Preetum and Kaplun, Gal and Bansal, Yamini and Yang, Tristan and Barak, Boaz and Sutskever, Ilya, Deep double descent: Where bigger models and more data hurt, arXiv preprint arXiv:1912.02292, (2019).
  70. Cybenko, George, Approximation by superpositions of a sigmoidal function, Mathematics of control, signals and systems 2, Springer (1989), no.4 303-314. https://doi.org/10.1007/BF02551274
  71. Joachims, Thorsten, Training linear SVMs in linear time, Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM (2006), 217-226.
  72. Boughorbel, Sabri and Tarel, J-P and Boujemaa, Nozha, Conditionally positive definite kernels for svm based image recognition, 2005 IEEE International Conference on Multimedia and Expo, IEEE (2005), 113-116.
  73. Nasrabadi, Nasser M and Kwon, Heesung, Kernel spectral matched filter for hyperspectral target detection, Proceedings.(ICASSP'05). IEEE International Conference on Acoustics, Speech, and Signal Processing 4, IEEE (2005), iv-665.
  74. Boughorbel, Sabri and Tarel, Jean-Philippe and Fleuret, Francois and Boujemaa, Nozha, The GCS kernel for SVM-based image recognition, International Conference on Artificial Neural Networks, Springer (2005), 595-600.
  75. Chiroma, Haruna and Abdulkareem, Sameem and Abubakar, Adamu I and Herawan, Tutut, Kernel functions for the support vector machine: comparing performances on crude oil price data, Recent Advances on Soft Computing and Data Mining, Springer (2014), 273-281.
  76. Fleuret, Francois and Sahbi, Hichem, Scale-invariance of support vector machines based on the triangular kernel, 3rd International Workshop on Statistical and Computational Theories of Vision, (2003), 1-13.
  77. Achirul Nanda, Muhammad and Boro Seminar, Kudang and Nandika, Dodi and Maddu, Akhiruddin, A comparison study of kernel functions in the support vector machine and its application for termite detection, Information 9, Multidisciplinary Digital Publishing Institute (2018), no.1.
  78. Gunn, Steve R and others, Support vector machines for classification and regression, ISIS technical report 14, University of Southampton (1998), no.1 5-16.
  79. Maji, Subhransu and Berg, Alexander C and Malik, Jitendra, Efficient classification for additive kernel SVMs, IEEE transactions on pattern analysis and machine intelligence 35, IEEE (2012), no.1 66-77. https://doi.org/10.1109/TPAMI.2012.62
  80. Vanek, Jan and Michalek, Josef and Psutka, Josef, A Comparison of Support Vector Machines Training GPU-Accelerated Open Source Implementations, arXiv preprint arXiv:1707.06470, (2017).
  81. Afifi, Shereen Moataz and GholamHosseini, Hamid and Poopak, S, Hardware implementations of SVM on FPGA: A state-of-the-art review of current practice, International Journal of Innovative Science Engineering and Technology (IJISET), (2015).
  82. Burges, Christopher JC, Speed up SVM algorithm for massive classification tasks, International conference on advanced data mining and applications, Springer (2008), 147-157.
  83. Burges, CJC and Vapnik, V, A new method for constructing artificial neural networks, Interim technical report, ONR contract, (1995).
  84. Abiodun, Oludare Isaac and Jantan, Aman and Omolara, Abiodun Esther and Dada, Kemi Victoria and Mohamed, Nachaat AbdElatif and Arshad, Humaira, State-of-the-art in artificial neural network applications: A survey, Heliyon 4, Elsevier (2018), no.11.
  85. Saravanan, Kl and Sasithra, S, Review on classification based on artificial neural networks, International Journal of Ambient Systems and Applications (IJASA) 2, (2014), no.4 11-18. https://doi.org/10.5121/ijasa.2014.2402
  86. Martinez-Porchas, Marcel and Villalpando-Canchola, Enrique and Vargas-Albores, Francisco, Significant loss of sensitivity and specificity in the taxonomic classification occurs when short 16S rRNA gene sequences are used, Heliyon 2, Elsevier (2016), no.9.
  87. Abid, Faroudja and Hamami, Latifa, A survey of neural network based automated systems for human chromosome classification, Artificial Intelligence Review 49, Springer (2018), no.1 41-56. https://doi.org/10.1007/s10462-016-9515-5
  88. Wang, Yaohui and Zhang, Jiyang, Application of SVM in object tracking based on Laplacian kernel function, 8th International Conference on Intelligent Human-Machine Systems and Cybernetics (IHMSC) 2, IEEE (2016), 557-561.
  89. Zhang, Li and Zhou, Weida and Jiao, Licheng, Wavelet support vector machine, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 34, IEEE (2004), no.1 34-39. https://doi.org/10.1109/TSMCB.2003.811113
  90. Xiang, Li and Quanyin, Zhu and Liuyang, Wang, Research of bessel kernel function of the first kind for support vector regression, Information Technology Journal 12, ANSINET (2013), no.14 2673-2682. https://doi.org/10.3923/itj.2013.2673.2682
  91. Horvath, Gabor, CMAC neural network as an SVM with B-Spline kernel functions, Proceedings of the 20th IEEE Instrumentation Technology Conference (Cat. No. 03CH37412) 2, IEEE (2003), 1108-1113.
  92. Aftab, Wasim and Moinuddin, Muhammad and Shaikh, Muhammad Shafique, A novel kernel for RBF based neural networks, Abstract and Applied Analysis, Hindawi (2014).
  93. Abadi, Wassila and Fezari, Mohamed and Hamdi, Rachid, Bag of Visualwords and ChiSquared Kernel Support Vector Machine: A Way to Improve Hand Gesture Recognition, Proceedings of the International Conference on Intelligent Information Processing, Security and Advanced Communication, ACM (2015).
  94. Rao, Swathi, Effects of Image Retrieval from Image Database using Linear Kernel and Hellinger Kernel Mapping of SVM, International Journal of Scientific & Engineering Research 4, no.5.
  95. Roul, A Modified Cosine-Similarity based Log Kernel for Support Vector Machines in the Domain of Text Classification, Proceedings of the 14th International Conference on Natural Language Processing, (2017), 338-347.