Browse > Article
http://dx.doi.org/10.3837/tiis.2018.08.026

Developing an Intrusion Detection Framework for High-Speed Big Data Networks: A Comprehensive Approach  

Siddique, Kamran (Dongguk University)
Akhtar, Zahid (University of Memphis)
Khan, Muhammad Ashfaq (Dongguk University)
Jung, Yong-Hwan (Korea Institute of Science and Technology Information)
Kim, Yangwoo (Dongguk University)
Publication Information
KSII Transactions on Internet and Information Systems (TIIS) / v.12, no.8, 2018 , pp. 4021-4037 More about this Journal
Abstract
In network intrusion detection research, two characteristics are generally considered vital to building efficient intrusion detection systems (IDSs): an optimal feature selection technique and robust classification schemes. However, the emergence of sophisticated network attacks and the advent of big data concepts in intrusion detection domains require two more significant aspects to be addressed: employing an appropriate big data computing framework and utilizing a contemporary dataset to deal with ongoing advancements. As such, we present a comprehensive approach to building an efficient IDS with the aim of strengthening academic anomaly detection research in real-world operational environments. The proposed system has the following four characteristics: (i) it performs optimal feature selection using information gain and branch-and-bound algorithms; (ii) it employs machine learning techniques for classification, namely, Logistic Regression, Naïve Bayes, and Random Forest; (iii) it introduces bulk synchronous parallel processing to handle the computational requirements of large-scale networks; and (iv) it utilizes a real-time contemporary dataset generated by the Information Security Centre of Excellence at the University of Brunswick (ISCX-UNB) to validate its efficacy. Experimental analysis shows the effectiveness of the proposed framework, which is able to achieve high accuracy, low computational cost, and reduced false alarms.
Keywords
Network intrusion detection systems; anomaly detection; bulk synchronous parallel; BSP; big data; machine learning; Darpa; KDD Cup 99; ISCX-UNB dataset;
Citations & Related Records
Times Cited By KSCI : 1  (Citation Analysis)
연도 인용수 순위
1 Apache Spark. Available online: (accessed on 20 February 2018).
2 Apache Storm. Available online: (accessed on 20 February 2018).
3 M. A. Manzoor, and Y. Morgan, "Network intrusion detection system using apache storm," Advances in Science, Technology and Engineering Systems Journal, Vol. 2, Issue 3, pp. 812-818, 2017.   DOI
4 S. H. Kang, and K. J. Kim, "A feature selection approach to find optimal feature subsets for the network intrusion detection system," Cluster Computing, Vol. 19, No. 1, pp. 325-333, 2016.   DOI
5 M. Kakavand, N. Mustapha, A. Mustapha, and M. T. Abdullah, "Effective Dimensionality Reduction of Payload-Based Anomaly Detection in TMAD Model for HTTP Payload," KSII Transactions on Internet and Information Systems, Vol. 10, No. 8, pp. 3884-3910, 2016.   DOI
6 G. Kumar, and K. Kumar, "Design of an evolutionary approach for intrusion detection," The Scientific World Journal, 2013.
7 W. Yassin, N. I. Udzir, Z. Muda, and M. N. Sulaiman, "Anomaly-based intrusion detection through k-means clustering and naives bayes classification," in Proc. of Proceedings of 4th International Conference on Computing and Informatics (ICOCI), No. 49, pp. 298-303, 2013.
8 M. H. Tahir, A. M. Said, N. H. Osman, N. H. Zakaria, P. N. M. Sabri, and N. Katuk, "Oving K-Means Clustering using discretization technique in Network Intrusion Detection System," in Proc. of IEEE 3rd International Conference on Computer and Information Sciences (ICCOINS), 15-17 August 2016, Kuala Lampur, Malaysia, pp. 248-252.
9 Z. Tan, A. Jamdagni, X. He, P. Nanda, R. P. Liu, and J. Hu, "Detection of denial-of-service attacks based on computer vision techniques," IEEE Transactions on Computers, Vol. 64, No. 9, pp. 2519-2533, 2015. ,   DOI
10 H. Sallay, A. Ammar, M. B. Saad, and S. Bourouis, "A real time adaptive intrusion detection alert classifier for high speed networks," in Proc. of IEEE 12th International Symposium on Network Computing and Applications (NCA), 22-24 August 2013, Cambridge, MA, USA, pp. 73-80.
11 H. Liu, and H. Motoda, "Data reduction via instance selection," Instance selection and construction for data mining, pp. 3-20. Springer, Boston, MA, 2001.
12 H. Trevor, T. Robert, and J. Friedman, "The elements of statistical learning," Vol. 1, 2001.
13 K. Fawagreh, M. M. Gaber, and E. Elyan, "Random forests: from early developments to recent advancements," Systems Science & Control Engineering: An Open Access Journal, Vol. 2, No. 1, pp. 602-609, 2014.   DOI
14 Jr. D.W. Hosmer, S. Lemeshow, and R. X. Sturdivant, "Applied logistic regression," Vol. 398, John Wiley & Sons, Hoboken, NJ, USA, 2013.
15 I. Rish, "An empirical study of the naive Bayes classifier," in Proc. of IBM IJCAI Workshop on Empirical Methods in Artificial Intelligence, Vol. 3, No. 22, pp. 41-46, 2001.
16 S. McCann, and D. G. Lowe, "Local naive bayes nearest neighbor for image classification," in Proc. of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 16-21 June 2012, Providence, RI, USA, pp. 3650-3656.
17 M. Langarizadeh, and F. Moghbeli, "Applying naive Bayesian networks to disease prediction: a systematic review," Acta Informatica Medica, Vol. 24, No. 5, 2016.
18 G. Biau, "Analysis of a random forests model," The Journal of Machine Learning Research, pp. 1063-1095, 2012.
19 M. Denil, D. Matheson, and N. D. Freitas, "Narrowing the gap: Random forests in theory and in practice," in Proc. of International Conference on Machine Learning (ICML), 2014.
20 K. Siddique, Z. Akhtar, E. J. Yoon, Y. S. Jeong, D. Dasgupta, and Y. Kim, "Apache Hama: an emerging bulk synchronous parallel computing framework for big data applications," IEEE Access, Vol. 4, pp. 8879-8887, 2016.   DOI
21 K. Siddique, Z. Akhtar, Y. Kim, Y. S. Jeong, and E. J. Yoon, "Investigating Apache Hama: a bulk synchronous parallel computing framework," The Journal of Supercomputing, Vol. 73, No. 9, pp. 4190-4205, 2017.   DOI
22 M. M. Rathore, A. Ahmad, and A. Paul, "Real time intrusion detection system for ultra-high-speed big data environments," The Journal of Supercomputing, Vol. 72, No. 9, pp. 3489-3510, 2016.   DOI
23 M. Sokolova, and G. Lapalme, "A systematic analysis of performance measures for classification tasks," Information Processing & Management, Vol. 45, No. 4, pp. 427-437, 2009.   DOI
24 K. Grahn, M. Westerlund, and G. Pulkkis, "Analytics for network security: A survey and taxonomy," in Proc. of Information Fusion for Cyber-security Analytics, Springer, New York, NY, USA, pp. 175-193, 2017.
25 A. L. Buczak, and E. Guven, "A survey of data mining and machine learning methods for cyber security intrusion detection," IEEE Communications Surveys & Tutorials, Vol. 18, No. 2, pp. 1153-1176, 2016.   DOI
26 Cisco Visual Networking Index, The Zettabyte Era: Trends and Analysis, June 2017.
27 R. Heady, G. F. Luger, A. Maccabe, and M. Servilla, "The architecture of a network level intrusion detection system," Technical Report, Department of Computer Science. College of Engineering, University of New Mexico, Albuquerque, NM, USA, 15 August 1990.
28 V. P. Janeja, A. Azari, J. M. Namayanja, and B. Heilig, "B-dids: Mining anomalies in a Big-distributed Intrusion Detection System," in Proc. of Proceedings of the 2014 IEEE International Conference on Big Data (Big Data), Washington, DC, USA, 27-30 October 2014, pp. 32-34.
29 R. Kumari, M. K. Singh, R. Jha, and N. K. Singh, "Anomaly detection in network traffic using K-mean clustering," in Proc. of IEEE 3rd International Conference on Recent Advances in Information Technology (RAIT), Dhanbad, India, 3-5 March 2016, pp. 387-393.
30 R. Zuech, T. M. Khoshgoftaar, and R. Wald, "Intrusion detection and Big Heterogeneous Data: a survey," Journal of Big Data, Vol. 2, No. 1, 2015.
31 A. Ozgur, and H. Erdem, "A review of KDD99 dataset usage in intrusion detection and machine learning between 2010 and 2015," PeerJ PrePrints, 2016.
32 A. Shiravi, H. Shiravi, M. Tavallaee, and A. A. Ghorbani, "Toward developing a systematic approach to generate benchmark datasets for intrusion detection," Computers & Security, Vol. 31, No. 3, pp. 357-374, 2012.   DOI
33 MAWI Working Group Traffic Archive: Available online: (accessed on 20 February 2018).
34 M. H. Bhuyan, D. K. Bhattacharyya, and J. K. Kalita, "Towards Generating Real-life Datasets for Network Intrusion Detection," IJ Network Security, Vol. 17, No. 6, pp. 683-701, 2015.
35 The UNSW-NB15 Dataset: Available online: (accessed on 20 February 2018).
36 M. H. Bhuyan, D. K. Bhattacharyya, and J. K. Kalita, "Network anomaly detection: methods, systems and tools," IEEE Communications Surveys & Tutorials, Vol. 16, No. 1, pp. 303-336, 2014.   DOI
37 W. Haider, J. Hu, J. Slay, B. P. Turnbull, and Y. Xie, "Generating realistic intrusion detection system dataset based on fuzzy qualitative modeling," Journal of Network and Computer Applications, Vol. 87, pp. 185-192, 2017.   DOI
38 R. Sommer, and V. Paxson, "Outside the closed world: On using machine learning for network intrusion detection," in Proc. of IEEE Symposium on Security and Privacy (SP), pp. 305-316, 2010.
39 J. P. Anderson, "Computer security threat monitoring and surveillance," Technical Report, Vol. 17, Fort Washington, USA, 1980.
40 S. Axelsson, "Intrusion detection systems: A survey and taxonomy," Technical Report, Vol. 99, 2000.
41 S. Suthaharan, "Big data classification: Problems and challenges in network intrusion prediction with machine learning," ACM SIGMETRICS Performance Evaluation Review, Vol. 41, No. 4, pp. 70-73, 2014.   DOI
42 L. Cheng, F. Liu, and D. D. Yao, "Enterprise data breach: causes, challenges, prevention, and future directions," Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, Vol. 7, No. 5, 2017.
43 Apache Hadoop. Available online: (accessed on 20 February 2018).