Browse > Article
http://dx.doi.org/10.3837/tiis.2019.07.021

A Classification Algorithm Based on Data Clustering and Data Reduction for Intrusion Detection System over Big Data  

Wang, Qiuhua (School of Cyberspace, Hangzhou Dianzi University)
Ouyang, Xiaoqin (School of Comunication Engineering, Hangzhou Dianzi University)
Zhan, Jiacheng (School of Comunication Engineering, Hangzhou Dianzi University)
Publication Information
KSII Transactions on Internet and Information Systems (TIIS) / v.13, no.7, 2019 , pp. 3714-3732 More about this Journal
Abstract
With the rapid development of network, Intrusion Detection System(IDS) plays a more and more important role in network applications. Many data mining algorithms are used to build IDS. However, due to the advent of big data era, massive data are generated. When dealing with large-scale data sets, most data mining algorithms suffer from a high computational burden which makes IDS much less efficient. To build an efficient IDS over big data, we propose a classification algorithm based on data clustering and data reduction. In the training stage, the training data are divided into clusters with similar size by Mini Batch K-Means algorithm, meanwhile, the center of each cluster is used as its index. Then, we select representative instances for each cluster to perform the task of data reduction and use the clusters that consist of representative instances to build a K-Nearest Neighbor(KNN) detection model. In the detection stage, we sort clusters according to the distances between the test sample and cluster indexes, and obtain k nearest clusters where we find k nearest neighbors. Experimental results show that searching neighbors by cluster indexes reduces the computational complexity significantly, and classification with reduced data of representative instances not only improves the efficiency, but also maintains high accuracy.
Keywords
IDS; KNN; Mini Batch K-Means; Representative Instance; Clustering;
Citations & Related Records
연도 인용수 순위
  • Reference
1 W. Lee and S. J. Stolfo, "Data Mining Approaches for Intrusion Detection," in Proc. of 7th conference on USENIX Security Symposium, pp. 79-94, Jan. 26-29, 1998.
2 K. Wankhade, S. Patka and R. Thool, "An Overview of Intrusion Detection Based on Data Mining Techniques," in Proc. of International Conference on Communication Systems and Network Technologies, pp. 626-629, Apr. 6-8, 2013.
3 E. Ariafar and R. Kiani, "Intrusion Detection System Using An Optimized Framework Based on Data Mining Techniques," in Proc. of IEEE International Conference on Knowledge-Based Engineering and Innovation, pp. 785-791, Dec. 22-22, 2017.
4 Z. Li, Y. Li and L. Xu, "Anomaly Intrusion Detection Method Based on K-means Clustering Algorithm With Particle Swarm Optimization," in Proc. of International Conference of Information Technology, Computer Engineering and Management Sciences, pp. 157-161, Sept. 24-25, 2011.
5 H. G. Kayacik, A. N. Zincir-Heywood and M. I. Heywood, "On the Capability of an SOM Based Intrusion Detection System," in Proc. of International Joint Conference on Neural Networks, pp. 1808-1813, Jul. 20-24, 2003.
6 J. Yang, "An Improved Intrusion Detection Algorithm Based on DBSCAN," Microcomputer Information, vol. 25, no. 3, pp. 58-60, Mar. 2009.
7 A. H. Farooqi and A. Munir, "Intrusion Detection System for IP Multimedia Subsystem Using K-nearest Neighbor Classifier," in Proc. of IEEE International Multitopic Conference, pp. 423-428, Dec. 23-24, 2008.
8 L. Li, Y. Yu, S. Bai, Y. Hou and X. Chen, "An Effective Two-Step Intrusion Detection Approach Based on Binary Classification and K-NN," IEEE Access, vol. 6, pp. 12060-12073, Dec. 2017.   DOI
9 F. Jiang, Y. Sui and C. Cao, "An Incremental Decision Tree Algorithm Based on Rough Sets and Its Application in Intrusion Detection," Artificial Intelligence Review, vol. 40, no. 4, pp. 517-530, Dec. 2013.   DOI
10 Q. Yang, H. Fu and T. Zhu, "An Optimization Method for Parameters of SVM in Network Intrusion Detection System," in Proc. of International Conference on Distributed Computing in Sensor Systems, pp. 136-142, May. 26-28, 2016.
11 F. Gumus, C. Okan Sakar, Z. Erdem and O. Kursun, "Online Naive Bayes Classification for Network Intrusion Ddetection," in Proc. of IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, pp. 670-674, Aug. 17-20, 2014.
12 P. A. Flach, "The Geometry of ROC Space: Understanding Machine Learning Metrics through ROC Isometrics," in Proc. of the Twentieth International Conference on Machine Learning, pp. 194-201, Aug. 21-24, 2003.
13 L. Bottou and Y. Bengio, "Convergence Properties of The K-means Algorithms," in Proc. of 7th International Conference on Neural Information Processing Systems, pp. 585-592, 1995.
14 S. Manocha and M. A. Girolami, "An Empirical Analysis of the Probabilistic K-Nearest Neighbour Classifier," Pattern Recognition Letters, vol. 28, no. 13, pp. 1818-1824, Oct. 2007.   DOI
15 D. Arthur, "K-means++: The Advantages of Careful Seeding," in Proc. of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 1027-1035, Jan. 07-09, 2007.
16 T. Fawcett, "An Introduction to ROC Analysis," Pattern Recognition Letters, vol. 27, no. 8, pp. 861-874, Jun. 2006.   DOI
17 K. C. Khor, C. Y. Ting and S. Phon-Amnuaisuk, "A Cascaded Classifier Approach for Improving Detection Rates on Rare Attack Categories in Network Intrusion Detection," Applied Intelligence, vol. 36, no. 2, pp. 320-329, Mar. 2012.   DOI
18 L. Koc and A. D. Carswell, "Network Intrusion Detection Using a HNB Binary Classifier," in Proc. of 17th UKSim-AMSS International Conference on Modelling and Simulation, pp. 81-85, Mar. 25-27, 2015.
19 X. Yin, Y. Zhang and X. Chen, "A Binary-Classification Method Based on Dictionary Learning and ADMM for Network Intrusion Detection," in Proc. of International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery, pp. 326-333, Oct. 12-14, 2017.
20 Y. Y. Chung and N. Wahid, "A Hybrid Network Intrusion Detection System using Simplified Swarm Optimization (SSO)," Applied Soft Computing, vol. 12, no. 9, pp. 3014-3022, Sept. 2012.   DOI
21 K. Atefi, S. Yahya, A. Y. Dak and A. Atefi, "A Hybrid Intrusion Detection System Based on Different Machine Learning Algorithms," in Proc. of 4th International Conference on Computing and Informatics, pp. 312-320, Aug. 28-30, 2013.
22 C. Guo, Y. Zhou, Y. Ping, S. Luo, Y. Lai and Z. Zhang, "Efficient Intrusion Detection Using Representative Instances," Computers & Security, vol. 39, Part B, pp. 255-267, Nov. 2013.   DOI
23 G. R. Kumar, N. Mangathayaru and G. Narasimha, "An Improved K-means Clustering Algorithm for Intrusion Detection Using Gaussian Function," in Proc. of International Conference on Engineering & Mis, pp. 1-7, Sept. 24-26, 2015.
24 N. Liu and J. Zhao, "Intrusion Detection Research Based on Improved PSO and SVM," in Proc. of International Conference on Automatic Control and Artificial Intelligence, pp. 1263-1266, Mar. 3-5, 2012.
25 M. Y. Su, "Using Clustering to Improve the KNN-based Classifiers for Online Anomaly Network Traffic Identification," Journal of Network Computer Applications, vol. 34, no. 2, pp. 722-730, Mar. 2011.   DOI
26 W. Lin, S. Ke and C. Tsai, "CANN: An Intrusion Detection System Based on Combining Cluster Centers and Nearest Neighbors," Knowledge-Based Systems, vol. 78, pp. 13-21, Apr. 2015.   DOI
27 F. Amiri, M. R. Yousefi, C. Lucas, A. Shakery and N. Yazdani, "Mutual Information-based Feature Selection for Intrusion Detection Systems," Journal of Network Computer Applications, vol. 34, no. 4, pp. 1184-1199, Jul. 2011.   DOI
28 Y. Li, J. Wang, Z. Tian, T. Lu and C. Young, "Building Lightweight Intrusion Detection System Using Wrapper-based Feature Selection Mechanisms," Computers & Security, vol. 28, no. 6, pp. 466-475, Sept. 2009.   DOI
29 F. Kuang, S. Zhang, Z. Jin and W. Xu, "A Novel SVMby Combining Kernel Principal Component Analysis and Improved Chaotic Particle Swarm Optimization for Intrusion Detection," Soft Computing, vol. 19, no. 5, pp. 1187-1199, May. 2015.   DOI
30 C. F. Tsai and C. Y. Lin, "A Triangle Area Based Nearest Neighbors Approach to Intrusion Detection," Pattern Recognition, vol. 43, no. 1, pp. 222-229, Jan. 2010.   DOI
31 X. Wang, C. Zhang and K. Zheng, "Intrusion Detection Algorithm Based on Density, Cluster Centers, and Nearest Neighbors," China Communications, vol. 13, no. 7, pp. 24-31, Jul. 2016.   DOI
32 K. Peng, V. C. M. Leung and Q. Huang, "Clustering Approach Based on Mini Batch Kmeans for Intrusion Detection System Over Big Data," IEEE Access, vol. 6, pp. 11897-11906, Feb. 2018.   DOI
33 D. Sculley, "Web-Scale K-means Clustering," in Proc. of 19th International Conference on World Wide Web, pp. 1177-1178, Apr. 26-30, 2010.
34 M. Tavallaee, E. Bagheri, W. Lu and A. A. Ghorbani, "A Detailed Analysis of the KDD CUP99 Data Set," in Proc. of IEEE International Conference on Computational Intelligence for Security and Defense Applications, pp. 1-6, Jul. 8-10, 2009.
35 A. Howard, "Elementary Linear Algebra," 7nd Edition, John Wiley & Sons, pp. 170-171, 1994.