[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.7858/eamj.2021.023

COMPARATIVE STUDY OF THE PERFORMANCE OF SUPPORT VECTOR MACHINES WITH VARIOUS KERNELS

Nam, Seong-Uk (Department of Mathematics, Pusan National University)
Kim, Sangil (Department of Mathematics, Pusan National University)
Kim, HyunMin (Department of Mathematics, Pusan National University)
Yu, YongBin (Department of Machine Learning Engineering,Department of Machine Learning Engineering, Silex)

Publication Information

East Asian mathematical journal / v.37, no.3, 2021 , pp. 333-354 More about this Journal

Abstract

A support vector machine (SVM) is a state-of-the-art machine learning model rooted in structural risk minimization. SVM is underestimated with regards to its application to real world problems because of the difficulties associated with its use. We aim at showing that the performance of SVM highly depends on which kernel function to use. To achieve these, after providing a summary of support vector machines and kernel function, we constructed experiments with various benchmark datasets to compare the performance of various kernel functions. For evaluating the performance of SVM, the F1-score and its Standard Deviation with 10-cross validation was used. Furthermore, we used taylor diagrams to reveal the difference between kernels. Finally, we provided Python codes for all our experiments to enable re-implementation of the experiments.

Keywords

Machine Learning; Support Vector Machine; Kernel Functions; Taylor Diagram;

Citations & Related Records

Reference

1	Mangasarian, Olvi L and Wolberg, William H, Cancer diagnosis via linear programming, University of Wisconsin-Madison Department of Computer Sciences, (1990).
2	Vedaldi, Andrea and Zisserman, Andrew, Efficient additive kernels via explicit feature maps, IEEE transactions on pattern analysis and machine intelligence 34, IEEE (2012), no.3 480-492. DOI
3	Alashwal, Hany and Deris, Safaai and Othman, Razib M, A Bayesian kernel for the Prediction of Protein-Protein Interactions, World Academy of Science, Engineering and Technology 51, (2009), 928-933.
4	Shawe-Taylor, John and Bartlett, Peter L and Williamson, Robert C and Anthony, Martin, Structural risk minimization over data-dependent hierarchies, IEEE transactions on Information Theory 44, IEEE (1998), no.5 1926-1940. DOI
5	Abiodun, Oludare Isaac and Jantan, Aman and Omolara, Abiodun Esther and Dada, Kemi Victoria and Mohamed, Nachaat AbdElatif and Arshad, Humaira, State-of-the-art in artificial neural network applications: A survey, Heliyon 4, Elsevier (2018), no.11.
6	Taylor, Karl E, Summarizing multiple aspects of model performance in a single diagram, Data mining and knowledge discovery 106, Journal of Geophysical Research: Atmospheres (1998), no.D7 7183-7192. DOI
7	Yannick Copin, taylor diagram python code, URL https://gist.github.com/ycopin/3342888, (2018).
8	Church, Keith B and Curram, Stephen P, Forecasting consumers' expenditure: A comparison between econometric and neural network models, International journal of forecasting 12, Elsevier (1996), no.2 255-267. DOI
9	Dietterich, Thomas G, Ensemble methods in machine learning, International workshop on multiple classifier systems, Springer (2000), 1-15.
10	Nayak, Janmenjoy and Naik, Bighnaraj and Behera, H, A comprehensive survey on support vector machine in data mining tasks: applications & challenges, International Journal of Database Theory and Application 8, (2015), no.1 169-186. DOI
11	Pedregosa, Fabian and Varoquaux, Gael and Gramfort, Alexandre and Michel, Vincent and Thirion, Bertrand and Grisel, Olivier and Blondel, Mathieu and Prettenhofer, Peter and Weiss, Ron and Dubourg, Vincent and others, Scikit-learn: Machine learning in Python, Journal of machine learning research 12, (2011), 2825-2830
12	McKinney, Wes and others, Data structures for statistical computing in python, Proceedings of the 9th Python in Science Conference 445, Austin, TX (2010), 51-56.
13	Achirul Nanda, Muhammad and Boro Seminar, Kudang and Nandika, Dodi and Maddu, Akhiruddin, A comparison study of kernel functions in the support vector machine and its application for termite detection, Information 9, Multidisciplinary Digital Publishing Institute (2018), no.1.
14	Vanek, Jan and Michalek, Josef and Psutka, Josef, A Comparison of Support Vector Machines Training GPU-Accelerated Open Source Implementations, arXiv preprint arXiv:1707.06470, (2017).
15	Chiroma, Haruna and Abdulkareem, Sameem and Abubakar, Adamu I and Herawan, Tutut, Kernel functions for the support vector machine: comparing performances on crude oil price data, Recent Advances on Soft Computing and Data Mining, Springer (2014), 273-281.
16	Fleuret, Francois and Sahbi, Hichem, Scale-invariance of support vector machines based on the triangular kernel, 3rd International Workshop on Statistical and Computational Theories of Vision, (2003), 1-13.
17	Gunn, Steve R and others, Support vector machines for classification and regression, ISIS technical report 14, University of Southampton (1998), no.1 5-16.
18	Maji, Subhransu and Berg, Alexander C and Malik, Jitendra, Efficient classification for additive kernel SVMs, IEEE transactions on pattern analysis and machine intelligence 35, IEEE (2012), no.1 66-77. DOI
19	Afifi, Shereen Moataz and GholamHosseini, Hamid and Poopak, S, Hardware implementations of SVM on FPGA: A state-of-the-art review of current practice, International Journal of Innovative Science Engineering and Technology (IJISET), (2015).
20	Andrews, Robert and Diederich, Joachim and Tickle, Alan B, Survey and critique of techniques for extracting rules from trained artificial neural networks, Knowledge-based systems 8, Elsevier (1995), no.6 373-389. DOI
21	Castro, Juan L and Mantas, Carlos J and Benitez, Jose Manuel, Interpretation of artificial neural networks by means of fuzzy rules, IEEE Transactions on Neural Networks 13, IEEE (2002), no.1 101-116. DOI
22	Setiono, Rudy and Leow, Wee Kheng and Zurada, Jacek M, Extraction of rules from artificial neural networks for nonlinear regression, IEEE transactions on neural networks 13, IEEE (2002), no.3 564-577. DOI
23	Burges, Christopher JC, Speed up SVM algorithm for massive classification tasks, International conference on advanced data mining and applications, Springer (2008), 147-157.
24	Burges, CJC and Vapnik, V, A new method for constructing artificial neural networks, Interim technical report, ONR contract, (1995).
25	Saravanan, Kl and Sasithra, S, Review on classification based on artificial neural networks, International Journal of Ambient Systems and Applications (IJASA) 2, (2014), no.4 11-18. DOI
26	Belkin, Mikhail and Hsu, Daniel and Ma, Siyuan and Mandal, Soumik, Reconciling modern machine-learning practice and the classical bias-variance trade-off, Proceedings of the National Academy of Sciences 116, National Acad Sciences (2019), no.32 15849- 15854. DOI
27	Nasrabadi, Nasser M and Kwon, Heesung, Kernel spectral matched filter for hyperspectral target detection, Proceedings.(ICASSP'05). IEEE International Conference on Acoustics, Speech, and Signal Processing 4, IEEE (2005), iv-665.
28	Lisboa, Paulo JG, A review of evidence of health benefit from artificial neural networks in medical intervention, Neural networks 15, Elsevier (2002), no.1 11-39. DOI
29	Vapnik, Vladimir, Estimation of dependences based on empirical data, Springer Science & Business Media, (2006).
30	Portney, Leslie Gross and Watkins, Mary P and others, Foundations of clinical research: applications to practice, Pearson/Prentice Hall Upper Saddle River, NJ 892, (2009).
31	McCulloch, Warren S and Pitts, Walter, A logical calculus of the ideas immanent in nervous activity, The bulletin of mathematical biophysics 5, Springer (1943), no.4 115-133. DOI
32	McClelland, James L and Rumelhart, David E and PDP Research Group and others, Parallel distributed processing, MIT press Cambridge, MA: 2, (1987).
33	Rokach, Lior and Maimon, Oded, Feature set decomposition for decision trees, Intelligent Data Analysis 9, IOS Press (1998), no.2 131-158. DOI
34	Setiono, Rudy and Thong, James YL, An approach to generate rules from neural networks for regression problems, European Journal of Operational Research 155, Elsevier (2004), no.1 239-250. DOI
35	Castro, Juan Luis and Mantas, Carlos Javier and Benitez, JM, Neural networks with a continuous squashing function in the output are universal approximators, Neural Networks 13, Elsevier (2000), no.6 561-563. DOI
36	Funahashi, Ken-Ichi, On the approximate realization of continuous mappings by neural networks, Neural networks 2, Elsevier (1989), no.3 183-192. DOI
37	Saunders, Craig and Stitson, Mark O and Weston, Jason and Bottou, Leon and Smola, A and others, Support vector machine-reference manual, Technical Report, Department of Computer Science, Royal Holloway, University of London, Egham, UK, (1998).
38	Chen, Rung-Ching and Hsieh, Chung-Hsun, Web page classification based on a support vector machine using a weighted vote schema, Expert Systems with Applications 31, Elsevier (2006), no.2 427-435. DOI
39	Valentini, Giorgio, Gene expression data analysis of human lymphoma using support vector machines and output coding ensembles, Artificial Intelligence in Medicine 26, Elsevier (2002), no.3 281-304. DOI
40	Fadel, Sayed and Ghoniemy, Said and Abdallah, Mohamed and Sorra, Hussein Abu and Ashour, Amira and Ansary, Asif, Investigating the effect of different kernel functions on the performance of SVM for recognizing Arabic characters, International Journal of Advanced Computer Science and Applications 7, Citeseer (2016), no.1 446-450.
41	Kar, Purushottam and Karnick, Harish, Random feature maps for dot product kernels, Artificial Intelligence and Statistics (2012), 583-591.
42	Wang, Benjamin X and Japkowicz, Nathalie, Boosting support vector machines for imbalanced data sets, Knowledge and information systems 25, Springer (2010), no.1 1-20. DOI
43	Martinez-Porchas, Marcel and Villalpando-Canchola, Enrique and Vargas-Albores, Francisco, Significant loss of sensitivity and specificity in the taxonomic classification occurs when short 16S rRNA gene sequences are used, Heliyon 2, Elsevier (2016), no.9.
44	Abid, Faroudja and Hamami, Latifa, A survey of neural network based automated systems for human chromosome classification, Artificial Intelligence Review 49, Springer (2018), no.1 41-56. DOI
45	Wang, Yaohui and Zhang, Jiyang, Application of SVM in object tracking based on Laplacian kernel function, 8th International Conference on Intelligent Human-Machine Systems and Cybernetics (IHMSC) 2, IEEE (2016), 557-561.
46	Aftab, Wasim and Moinuddin, Muhammad and Shaikh, Muhammad Shafique, A novel kernel for RBF based neural networks, Abstract and Applied Analysis, Hindawi (2014).
47	Zhang, Li and Zhou, Weida and Jiao, Licheng, Wavelet support vector machine, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 34, IEEE (2004), no.1 34-39. DOI
48	Xiang, Li and Quanyin, Zhu and Liuyang, Wang, Research of bessel kernel function of the first kind for support vector regression, Information Technology Journal 12, ANSINET (2013), no.14 2673-2682. DOI
49	Horvath, Gabor, CMAC neural network as an SVM with B-Spline kernel functions, Proceedings of the 20th IEEE Instrumentation Technology Conference (Cat. No. 03CH37412) 2, IEEE (2003), 1108-1113.
50	BONITA, OLIVIA and MUFLIKHAH, LAILIL, Comparison of Gaussian and ANOVA Kernel in Support Vector Regression for Predicting Coal Price, 2018 International Conference on Sustainable Information Engineering and Technology (SIET), IEEE (2018), 147-150.
51	Gish, Herbert, A probabilistic approach to the understanding and training of neural network classifiers, International Conference on Acoustics, Speech, and Signal Processing, IEEE (1990), 1361-1364.
52	Caruana, Rich and Niculescu-Mizil, Alexandru, An empirical comparison of supervised learning algorithms, Proceedings of the 23rd international conference on Machine learning, ACM (2006), 161-168.
53	Roul, A Modified Cosine-Similarity based Log Kernel for Support Vector Machines in the Domain of Text Classification, Proceedings of the 14th International Conference on Natural Language Processing, (2017), 338-347.
54	Sollich, Peter and Krogh, Anders, Learning with ensembles: How overfitting can be useful, Advances in neural information processing systems, (1996), 190-196.
55	Abadi, Wassila and Fezari, Mohamed and Hamdi, Rachid, Bag of Visualwords and ChiSquared Kernel Support Vector Machine: A Way to Improve Hand Gesture Recognition, Proceedings of the International Conference on Intelligent Information Processing, Security and Advanced Communication, ACM (2015).
56	Faraway, Julian and Chatfield, Chris, Time series forecasting with neural networks: a comparative study using the air line data, Journal of the Royal Statistical Society: Series C (Applied Statistics) 47, Wiley Online Library (1998), no.2 231-250.
57	Kuncheva, Ludmila I and Whitaker, Christopher J, Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy, Machine learning 51, Springer (2003), no.2 181-207. DOI
58	Brown, Gavin and Wyatt, Jeremy and Harris, Rachel and Yao, Xin, Diversity creation methods: a survey and categorisation, Information Fusion 6, Elsevier (2005), no.1 5-20. DOI
59	Adeva, Juan Jose Garcia and Beresi, U and Calvo, R, Accuracy and diversity in ensembles of text categorisers, CLEI Electronic Journal 9, (2005), no.1 1-12.
60	Krogh, Anders and Vedelsby, Jesper, Neural network ensembles, cross validation, and active learning, Advances in neural information processing systems, (1995) 231-238.
61	Nakkiran, Preetum and Kaplun, Gal and Bansal, Yamini and Yang, Tristan and Barak, Boaz and Sutskever, Ilya, Deep double descent: Where bigger models and more data hurt, arXiv preprint arXiv:1912.02292, (2019).
62	Cybenko, George, Approximation by superpositions of a sigmoidal function, Mathematics of control, signals and systems 2, Springer (1989), no.4 303-314. DOI
63	Callen, Jeffrey L and Kwan, Clarence CY and Yip, Patrick CY and Yuan, Yufei, Neural network forecasting of quarterly accounting earnings, International Journal of Forecasting 12, Elsevier (1996), no.4 475-482. DOI
64	Boughorbel, Sabri and Tarel, J-P and Boujemaa, Nozha, Conditionally positive definite kernels for svm based image recognition, 2005 IEEE International Conference on Multimedia and Expo, IEEE (2005), 113-116.
65	Boughorbel, Sabri and Tarel, Jean-Philippe and Fleuret, Francois and Boujemaa, Nozha, The GCS kernel for SVM-based image recognition, International Conference on Artificial Neural Networks, Springer (2005), 595-600.
66	Joachims, Thorsten, Training linear SVMs in linear time, Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM (2006), 217-226.
67	Zhang, Guoqiang Peter, Neural networks for classification: a survey, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 30, IEEE (2000), no.4 451-462. DOI
68	Adya, Monica and Collopy, Fred, How effective are neural networks at forecasting and prediction? A review and evaluation, Journal of forecasting 17, Wiley Online Library (1998), no.5-6 481-495. DOI
69	Cortez, Paulo and Cerdeira, Antonio and Almeida, Fernando and Matos, Telmo and Reis, Jose, Modeling wine preferences by data mining from physicochemical properties, Decision Support Systems 47, Elsevier (1998), no.4 547-553. DOI
70	Breiman, Leo, Classification and regression trees, Routledge (2017).
71	Hippert, Henrique Steinherz and Pedreira, Carlos Eduardo and Souza, Reinaldo Castro, Neural networks for short-term load forecasting: A review and evaluation, IEEE Transactions on power systems 16, IEEE (2001), no.1 44-55. DOI
72	Connor, Jerome T and Martin, R Douglas and Atlas, Les E, Recurrent neural networks and robust time series prediction, IEEE transactions on neural networks 5, IEEE (1994), no.2 240-254. DOI
73	Fletcher, Desmond and Goss, Ernie, Forecasting with neural networks: an application using bankruptcy data, Information & Management 24, Elsevier (1993), no.3 159-167. DOI
74	Gorr, Wilpen L, Research prospective on neural network forecasting, International Journal of Forecasting 10, Elsevier (1994), no.1 1-4. DOI
75	Belli, MR and Conti, Massimo and Crippa, Paolo and Turchetti, Claudio, Artificial neural networks as approximators of stochastic processes, Neural Networks 12, Elsevier (1999), no.4-5 647-658. DOI
76	Cassotti, M and Ballabio, D and Todeschini, R and Consonni, V, A similarity-based QSAR model for predicting acute toxicity towards the fathead minnow (Pimephales promelas), SAR and QSAR in Environmental Research 26, Taylor & Francis (2015), no.3 217-243. DOI
77	Nakai, Kenta and Kanehisa, Minoru, Expert system for predicting protein localization sites in gram-negative bacteria, Proteins: Structure, Function, and Bioinformatics 11, Springer (1991), no.2 95-110. DOI
78	Cottrell, Marie and Girard, Bernard and Girard, Yvonne and Mangeas, Morgan and Muller, Corinne, Neural modeling for time series: a statistical stepwise method for weight elimination, IEEE transactions on neural networks 6, IEEE (1995), no.6 1355-1364. DOI
79	Silva, Pedro FB and Marcal, Andre RS and da Silva, Rubim M Almeida, Evaluation of features for leaf discrimination, International Conference Image Analysis and Recognition, Springer (2013), 197-204.
80	Dua, Dheeru and Graff, Casey, UCI Machine Learning Repository, University of California, Irvine, School of Information and Computer Sciences, (2017).
81	Redmond, Michael and Baveja, Alok, A data-driven software tool for enabling cooperative information sharing among police departments, European Journal of Operational Research 141, Elsevier (2002), no.3 660-678. DOI
82	Brooks, Thomas F and Pope, D Stuart and Marcolini, Michael A, Airfoil self-noise and prediction, (1989).
83	Smola, A tutorial on support vector regression, Statistics and computing 14, Springer (2004), no.3 199-222. DOI
84	Deng, Wan-Yu and Ong, Yew-Soon and Zheng, Qing-Hua, A fast reduced kernel extreme learning machine, Neural Networks 76, Elsevier (2016), 29-38. DOI
85	Rao, Swathi, Effects of Image Retrieval from Image Database using Linear Kernel and Hellinger Kernel Mapping of SVM, International Journal of Scientific & Engineering Research 4, no.5.
86	Cortez, Paulo and Morais, Anibal de Jesus Raimundo, A data mining approach to predict forest fires using meteorological data, Data mining and knowledge discovery, APPIA (2007).
87	Campbell, Colin, Kernel methods: a survey of current techniques, Neurocomputing 48, Elsevier (2002), no.1-4 63-84. DOI
88	Boser, Bernhard E and Guyon, Isabelle M and Vapnik, Vladimir N, A training algorithm for optimal margin classifiers, Proceedings of the fifth annual workshop on Computational learning theory, ACM (1992), 144-152.
89	Burges, Christopher JC, A tutorial on support vector machines for pattern recognition, Data mining and knowledge discovery 2, Springer (1998), no.2 121-167. DOI
90	Guyon, I, Svm application list, URL http://www.clopinet.com/isabelle/Projects/SVM/applist.html, (1999).
91	Wang, Guosheng, A survey on training algorithms for support vector machine classifiers, Fourth International Conference on Networked Computing and Advanced Information Management 1, IEEE (2008), 123-128.
92	Souza, Cesar R, Kernel functions for machine learning applications, Creative Commons Attribution-Noncommercial-Share Alike 3, (2010), 29.
93	Shawe-Taylor, John and Sun, Shiliang, A review of optimization methodologies in support vector machines, Neurocomputing 74, Elsevier (2011), no.17 3609-3618. DOI
94	Smola, Alex J and Scholkopf, Bernhard, Learning with kernels, 4, Citeseer (1998).
95	Rahimi, Ali and Recht, Benjamin, Random features for large-scale kernel machines, Advances in neural information processing systems, (2008), 1177-1184.