Browse > Article
http://dx.doi.org/10.3837/tiis.2018.03.021

On the Performance of Cuckoo Search and Bat Algorithms Based Instance Selection Techniques for SVM Speed Optimization with Application to e-Fraud Detection  

AKINYELU, Andronicus Ayobami (School of Mathematics, Statistics & Computer Science University of KwaZulu-Natal)
ADEWUMI, Aderemi Oluyinka (School of Mathematics, Statistics & Computer Science University of KwaZulu-Natal)
Publication Information
KSII Transactions on Internet and Information Systems (TIIS) / v.12, no.3, 2018 , pp. 1348-1375 More about this Journal
Abstract
Support Vector Machine (SVM) is a well-known machine learning classification algorithm, which has been widely applied to many data mining problems, with good accuracy. However, SVM classification speed decreases with increase in dataset size. Some applications, like video surveillance and intrusion detection, requires a classifier to be trained very quickly, and on large datasets. Hence, this paper introduces two filter-based instance selection techniques for optimizing SVM training speed. Fast classification is often achieved at the expense of classification accuracy, and some applications, such as phishing and spam email classifiers, are very sensitive to slight drop in classification accuracy. Hence, this paper also introduces two wrapper-based instance selection techniques for improving SVM predictive accuracy and training speed. The wrapper and filter based techniques are inspired by Cuckoo Search Algorithm and Bat Algorithm. The proposed techniques are validated on three popular e-fraud types: credit card fraud, spam email and phishing email. In addition, the proposed techniques are validated on 20 other datasets provided by UCI data repository. Moreover, statistical analysis is performed and experimental results reveals that the filter-based and wrapper-based techniques significantly improved SVM classification speed. Also, results reveal that the wrapper-based techniques improved SVM predictive accuracy in most cases.
Keywords
Support Vector Machines; classification; machine learning; phishing email; spam email;
Citations & Related Records
연도 인용수 순위
  • Reference
1 D. R. Wilson and T. R. Martinez, "Reduction Techniques for Instance-Based Learning Algorithms," Machine Learning, vol. 38, no. 3, pp. 257-286, March, 2000.   DOI
2 S. Garcia, J. R. Cano, and F. Herrera, "A memetic algorithm for evolutionary prototype selection: A scaling up approach," Pattern Recognition, vol. 41, no. 8, pp. 2693-2709, August, 2008.   DOI
3 I. M. Anwar, K. M. Salama, and A. M. Abdelbar, "Instance selection with ant colony optimization," Procedia Computer Science, vol. 53, pp. 248-256, January, 2015.   DOI
4 U. Garain, "Prototype reduction using an artificial immune model," Pattern Analysis and Applications, vol. 11, no. 3, pp. 353-363, September, 2008.   DOI
5 M. Behdad, L. Barone, M. Bennamoun, and T. French, "Nature-inspired techniques in the context of fraud detection," IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), vol. 42, no. 6, pp. 1273-1290, November, 2012.   DOI
6 KrebsOnSecurity. (2015), "FBI: $1.2B Lost to Business Email Scams". available at: http://krebsonsecurity.com/2015/08/fbi-1-2b-lost-to-business-email-scams/ (accessed 14-September - 2016).
7 T. N. Report. (2016, 01-August-2017). Card Fraud Worldwide. 12. Available: https://www.nilsonreport.com/upload/content_promo/The_Nilson_Report_10-17-2016.pdf
8 H. Brighton and C. Mellish, "Advances in instance selection for instance-based learning algorithms," Data mining and knowledge discovery, vol. 6, no. 2, pp. 153-172, April, 2002.   DOI
9 T. Reinartz, "A Unifying View on Instance Selection," Data Mining and Knowledge Discovery, vol. 6, no. 2, pp. 191-210, April, 2002.   DOI
10 J. Yang and S. Olafsson, "Optimization-based feature selection with adaptive instance sampling," Computers & Operations Research, vol. 33, no. 11, pp. 3088-3106, November, 2006.   DOI
11 J. Chen, C. Zhang, X. Xue, and C.-L. Liu, "Fast instance selection for speeding up support vector machines," Knowledge-Based Systems, vol. 45, pp. 1-7, June, 2013.   DOI
12 H. Lei and V. Govindaraju, "Speeding up multi-class SVM evaluation by PCA and feature election," in Proc. of the Workshop on Feature Selection for Data Mining:Interfacing Machine Learning and Statistics Newport Beach, CA, April 22, 2005.
13 C. Cortes and V. Vapnik, "Support-Vector Networks," Machine learning, vol. 20, no. 3, pp. 273-297, September, 1995.   DOI
14 B. Yashvantrai Vyas, R. P. Maheshwari, and B. Das, "Pattern Recognition Application of Support Vector Machine for Fault Classification of Thyristor Controlled Series Compensated Transmission Lines," Journal of The Institution of Engineers (India): Series B, vol. 97, no. 2, pp. 175-183, June, 2016.
15 A. Bergholz, J. H. Chang, G. PaaB, F. Reichartz, and S. Strobel, "Improved Phishing Detection using Model-Based Features," in Proc. of the Conference on Email and Anti-Spam (CEAS), Mountain View, CA, pp. 1-27, August 21-22, 2008.
16 A. A. Akinyelu and A. O. Adewumi, "Classification of phishing email using random forest machine learning technique," Journal of Applied Mathematics, vol. 2014, Article ID 425731, 6 pages, April, 2014.
17 E. Kremic and A. Subasi, "Performance of random forest and SVM in face recognition," Int. Arab J. Inf. Technol., vol. 13, no. 2, pp. 287-293, March, 2016.
18 M. Riyazuddin and V. V. S. S. S. Balaram, "Pattern Anonymization: Hybridizing Data Restructure with Feature Set Partitioning for Privacy Preserving in Supervised Learning," in Proc. of the First International Conference on Computational Intelligence and Informatics : ICCII 2016, S. C. Satapathy, V. K. Prasad, B. P. Rani, S. K. Udgata, and K. S. Raju, Eds., ed Singapore: Springer Singapore, pp. 603-614, 2017.
19 N. Panda, E. Y. Chang, and G. Wu, "Concept boundary detection for speeding up SVMs," in Proc. of the 23rd international conference on Machine learning, pp. 681-688, June 25 - 29, 2006.
20 J. A. Olvera-Lopez, J. A. Carrasco-Ochoa, J. F. Martinez-Trinidad, and J. Kittler, "A review of instance selection methods," Artificial Intelligence Review, vol. 34, no. 2, pp. 133-143, August, 2010.   DOI
21 Andrea., "Credit Card Fraud Detection," 2016. available at: https://www.kaggle.com/dalpozz/creditcardfraud (accessed 12-December-2016).
22 C. Group., "SpamAssassin Data," 2006. available at: http://www.csmining.org/index.php/spamassassin-datasets.html (accessed 05-August-2014).
23 J. Nazario., "Phishing Corpus," 2006. available at: http://monkey.org/jose/wiki/doku.php?id=PhishingCorpus (accessed 27-April-2015).
24 A. Asuncion and D. Newman., "UCI Machine Learning Repository," 2007. available at: http://archive.ics.uci.edu/ml/datasets.html (accessed 15-August-2016).
25 J. A. Olvera-Lopez, J. A. Carrasco-Ochoa, and J. F. Martinez-Trinidad, "A new fast prototype selection method based on clustering," Pattern Analysis and Applications, vol. 13, no. 2, pp. 131-141, May, 2010.   DOI
26 S. Chetty and A. O. Adewumi, "Three new stochastic local search metaheuristics for the annual crop planning problem based on a new irrigation scheme," Journal of Applied Mathematics, vol. 2013, Article ID 158538, 14 pages, 2013., May, 2013.
27 S. Fine and K. Scheinberg, "Efficient SVM training using low-rank kernel representations," The Journal of Machine Learning Research, vol. 2, pp. 243-264, December, 2002.
28 A. O. Adewumi and M. M. Ali, "A multi-level genetic algorithm for a multi-stage space allocation problem," Mathematical and Computer Modelling, vol. 51, no. 1, pp. 109-126, January, 2010.   DOI
29 T. R. Jensen and B. Toft, "Graph coloring problems," vol. 39, 2011.
30 O. A. Adewumi and A. A. Akinyelu, "A hybrid firefly and support vector machine classifier for phishing email detection," Kybernetes, vol. 45, no. 6, pp. 977-994, June, 2016.   DOI
31 X.-S. Yang and S. Deb, "Cuckoo search via Levy flights," in Proc. of World Congress on Nature & Biologically Inspired Computing, 2009. NaBIC 2009. , pp. 210-214, December 9-11, 2009.
32 X.-S. Yang and X. He, "Firefly algorithm: recent advances and applications," International Journal of Swarm Intelligence, vol. 1, no. 1, pp. 36-50, January, 2013.   DOI
33 J. Kennedy and R. Eberhart, "Particle swarm optimization," in Proc. of IEEE international conference on neural networks, vol. 4, no. 2, pp. 1942-1948, November, 1995.
34 S. Kirkpatrick, C. D. Gelatt, and M. P. Vecchi, "Optimization by simulated annealing," science, vol. 220, no. 4598, pp. 671-680, May, 1983.   DOI
35 X.-S. Yang, "A New Metaheuristic Bat-Inspired Algorithm," in Proc. of Nature Inspired Cooperative Strategies for Optimization (NICSO 2010), J. R. Gonzalez, D. A. Pelta, C. Cruz, G. Terrazas, and N. Krasnogor, Eds., ed Berlin, Heidelberg: Springer Berlin Heidelberg, pp. 65-74, 2010.
36 X.-S. Yang. (2015), "Bat Algorithm". available at: https://www.mathworks.com/matlabcentral/fileexchange/37582-bat-algorithm--demo-/content/bat_algorithm.m (accessed 11-September-2016).
37 C. Chien-Hsing, K. Bo-Han, and C. Fu, "The Generalized Condensed Nearest Neighbor Rule as A Data Reduction Method," in Proc. of 18th International Conference on Pattern Recognition (ICPR'06), pp. 556-559, August 20-24, 2006.
38 T. Raicharoen and C. Lursinsap, "A divide-and-conquer approach to the pairwise opposite class-nearest neighbor (POC-NN) algorithm," Pattern Recognition Letters, vol. 26, no. 10, pp. 1554-1567, July, 2005.   DOI
39 C.-W. Hsu, C.-C. Chang, and C.-J. Lin, "A practical guide to support vector classification. Tech. rep., Department of Computer Science, National Taiwan University.," no. 1-16, 2003.
40 X.-S. Yang. (2010), "Cuckoo Search (CS) Algorithm," available at: https://www.mathworks.com/matlabcentral/fileexchange/29809-cuckoo-search-cs-algorithm/content/cuckoo_search.m (accessed 11-September-2016).
41 B. L. Narayan, C. A. Murthy, and S. K. Pal, "Maxdiff kd-trees for data condensation," Pattern Recognition Letters, vol. 27, no. 3, pp. 187-200, February, 2006.   DOI
42 H. Liu and H. Motoda, "On Issues of Instance Selection," Data Mining and Knowledge Discovery, vol. 6, no. 2, pp. 115-130, April, 2002.   DOI
43 J. C. Bezdek and L. I. Kuncheva, "Nearest prototype classifier designs: An experimental study," International Journal of Intelligent Systems, vol. 16, no. 12, pp. 1445-1473, December, 2001.   DOI
44 V. Cerveron and F. J. Ferri, "Another move toward the minimum consistent subset: a tabu search approach to the condensed nearest neighbor rule," IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), vol. 31, no. 3, pp. 408-413, June, 2001.   DOI
45 R. R Rajalaxmi, "A Hybrid Binary Cuckoo Search and Genetic Algorithm for Feature Selection in Type-2 Diabetes," Current Bioinformatics, vol. 11, no. 4, pp. 490-499, September, 2016.   DOI
46 D. Rodrigues, L. A. M. Pereira, R. Y. M. Nakamura, K. A. P. Costa, X.-S. Yang, A. N. Souza, et al., "A wrapper approach for feature selection based on Bat Algorithm and Optimum-Path Forest," Expert Systems with Applications, vol. 41, no. 5, pp. 2250-2258, April, 2014.   DOI
47 S. A. Medjahed, T. A. Saadi, A. Benyettou, and M. Ouali, "Binary cuckoo search algorithm for band selection in hyperspectral image classification," IAENG International Journal of Computer Science, vol. 42, no. 3, pp. 183-191, July, 2015.
48 A. M. Taha, A. Mustapha, and S.-D. Chen, "Naive bayes-guided bat algorithm for feature selection," The Scientific World Journal, vol. 2013, Article ID 325973, 9 pages, 2013., December, 2013.
49 E. Emary, W. Yamany, and A. E. Hassanien, "New approach for feature selection based on rough set and bat algorithm," in Proc. of 9th International Conference on Computer Engineering & Systems (ICCES), pp. 346-353, December 22-23, 2014.
50 M. A. Laamari and N. Kamel, "A hybrid bat based feature selection approach for intrusion detection," in Proc. of Bio-Inspired Computing-Theories and Applications, ed: Springer, pp. 230-238, 2014.
51 J. A. Olvera-Lopez, J. A. Carrasco-Ochoa, and J. F. Martinez-Trinidad, "Sequential search for decremental edition," in Proc. of International Conference on Intelligent Data Engineering and Automated Learning, pp. 280-285, July 6-8, 2005.
52 L. I. Kuncheva, "Fitness functions in editing k-NN reference set by genetic algorithms," Pattern Recognition, vol. 30, no. 6, pp. 1041-1049, June, 1997.   DOI
53 J. R. Cano, F. Herrera, and M. Lozano, "Stratification for scaling up evolutionary prototype selection," Pattern Recognition Letters, vol. 26, no. 7, pp. 953-963, May, 2005.   DOI
54 P. Graham., "A Plan for Spam," 2002. available at: http://www.paulgraham.com/spam.html (accessed 04-August-2016).
55 S. Mousavirad and H. Ebrahimpour-Komleh, "Wrapper feature selection using discrete cuckoo optimization algorithm," International Journal of Mechatronics Electrical, and Computer Engineering, vol. 4, no. 11, pp. 709-721, April, 2014.
56 K. Bache and M. Lichman. (2013), "UCI machine learning repository". available at: http://archive.ics.uci.edu/ml (accessed 12-May-2017).
57 C.-C. Chang and C.-J. Lin, "LIBSVM: a library for support vector machines," ACM Transactions on Intelligent Systems and Technology (TIST), vol. 2, no. 3, p. 27, April, 2011.
58 R. Shams and R. E. Mercer, "Classifying Spam Emails Using Text and Readability Features," in Proc. of IEEE 13th International Conference on Data Mining, pp. 657-666, December 7-10, 2013.
59 R. Duncan. "A Simple Guide to HTML," available at: http://www.simplehtmlguide.com/whatisht-ml.php (accessed 13-September-2016).
60 A. Almomani, T.-C. Wan, A. Altaher, A. Manasrah, E. ALmomani, M. Anbar, et al., "Evolving fuzzy neural network for phishing emails detection," Journal of Computer Science, vol. 8, no. 7, p. 1099, July, 2012.   DOI
61 I. Fette, N. Sadeh, and A. Tomasic, "Learning to detect phishing emails," in Proc. of the 16th international conference on World Wide Web, Banff, AB, Canada, pp. 649-656, May 8-12, 2007.
62 N. Zhang and Y. Yuan, "Phishing Detection Using Neural Network," CS229 lecture notes.
63 C.-F. Tsai, W. Eberle, and C.-Y. Chu, "Genetic algorithms in feature and instance selection," Knowledge-Based Systems, vol. 39, pp. 240-247, February, 2013.   DOI