Browse > Article
http://dx.doi.org/10.3837/tiis.2017.10.024

Feature Selection Algorithm for Intrusions Detection System using Sequential Forward Search and Random Forest Classifier  

Lee, Jinlee (Division of Computer Science and Engineering, Konkuk University)
Park, Dooho (Intelligent Service Development Team, XIIlab Co. Ltd)
Lee, Changhoon (Division of Computer Science and Engineering, Konkuk University)
Publication Information
KSII Transactions on Internet and Information Systems (TIIS) / v.11, no.10, 2017 , pp. 5132-5148 More about this Journal
Abstract
Cyber attacks are evolving commensurate with recent developments in information security technology. Intrusion detection systems collect various types of data from computers and networks to detect security threats and analyze the attack information. The large amount of data examined make the large number of computations and low detection rates problematic. Feature selection is expected to improve the classification performance and provide faster and more cost-effective results. Despite the various feature selection studies conducted for intrusion detection systems, it is difficult to automate feature selection because it is based on the knowledge of security experts. This paper proposes a feature selection technique to overcome the performance problems of intrusion detection systems. Focusing on feature selection, the first phase of the proposed system aims at constructing a feature subset using a sequential forward floating search (SFFS) to downsize the dimension of the variables. The second phase constructs a classification model with the selected feature subset using a random forest classifier (RFC) and evaluates the classification accuracy. Experiments were conducted with the NSL-KDD dataset using SFFS-RF, and the results indicated that feature selection techniques are a necessary preprocessing step to improve the overall system performance in systems that handle large datasets. They also verified that SFFS-RF could be used for data classification. In conclusion, SFFS-RF could be the key to improving the classification model performance in machine learning.
Keywords
FeatureSelection; SFFS; RandomForest; IDS;
Citations & Related Records
Times Cited By KSCI : 3  (Citation Analysis)
연도 인용수 순위
1 C. Yin, L. Ma, L. Feng, Z. Yin and J. Wang, "A Feature Selection Algorithm towards Efficient Intrusion Detection," International Journal of Multimedia and Ubiquitous Engineering, vol.10, no.11, pp.253-264, 2015.
2 S. Y. Ohn, S. D. Chi, and M. Y. Han, "Feature Selection for Classification of Mass Spectrometric Proteomic Data Using Random Forest," The Korea Society For Simulation(KSS), Vol.22, No.4, pp.139-147, 2013.   DOI
3 W. Lee and S. Oh, "Efficient Feature Selection Based Near Real-Time Hybrid Intrusion Detection System," KIPS Tr. Comp. and Comm. Sys., vol.5, no.12, pp.471-480, Dec. 2016.   DOI
4 NSL-KDD Dataset [Internet], http://www.unb.ca/research/iscx/dataset/iscx-NSL-KDD-dataset.html.
5 M. Tavallaee, E. Bagheri, W. Lu, and A.-A. Ghorbani, "A detailed analysis of the kdd cup 99 data set," Computational Intelligence for Security and Defense Applications, CISDA 2009. IEEE Symposium on. IEEE, pp.1-6, 2009.
6 L. C. Molina, L. Belanche, and A. Nebot, "Feature selection algorithms: a survey and experimental evaluation," in Data Mining, ICDM 2003. Proceedings. 2002 IEEE International Conference on. IEEE. pp.306-313, 2002.
7 G. CHANDRASHEKAR, F. SAHIN, "A survey on feature selection methods," Computers & Electrical Engineering, Vol.40, No.1, pp.16-28, 2014.   DOI
8 L. Breiman, "Random forests," Machine learning, Vol.45, No.1, pp.5-32, 2001.   DOI
9 F. Baumann, A. Ehlers, K. Vogt, and B. Rosenhahn, "Cascaded Random Forest for Fast Object Detection," Scandinavian Conference on Image Analysis, Springer Berlin Heidelberg , pp. 131-142, 2013.
10 Y. Mishina, R. Murata, Y. Yamauchi, T. Yamashita, and H. Fujiyoshi, "Boosted random forest," IEICE TRANSACTIONS on Information and Systems, Vol.98, No.9, pp.1630-1636, 2015.
11 Kakavand, M., Mustapha, N., Mustapha, A., and Abdullah, M. T., "Effective Dimensionality Reduction of Payload-Based Anomaly Detection in TMAD Model for HTTP Payload," KSII Transactions on Internet and Information Systems, Vol. 10, No.8, pp.3884-3910, 2016   DOI
12 Ian H. Witten, Eibe Frank and Mark A. Hall, "Data Mining. 3rd," Trans. Lee. S. H, acorn, 2014.
13 M. A. Hall, "Correlation-based Feature Subset Selection for Machine Learning," doctoral dissertation, The University of Waikato, Canada, 1999.
14 H. Liu and R. Setiono, "A probabilistic approach to feature selection-A filter solution," in Proc. of 13th International Conference on Machine Learning, pp.319-327, 1996.
15 E. de la Hoz, A. Ortiz, J. Ortega, and E. de la Hoz, "Network anomaly classification by support vector classifiers ensemble and non-linear projection techniques," International Conference on Hybrid Artificial Intelligence Systems, Springer Berlin Heidelberg, vol.8073, pp.103-111, 2013.
16 Eid, H. F., Salama, M. A, Hassanien, A. E., and Kim, T. H, "Bi-layer behavioral-based feature selection approach for network intrusion classification," International Conference on Security Technology, Springer Berlin Heidelberg, vol. 259, pp.195-203, 2011.
17 S. Mukherjee and N. Sharma, "Intrusion detection using naive Bayes classifier with feature reduction," Procedia Technology, vol.4, pp.119-128, 2012.   DOI
18 H. F. Eid, A. E. Hassanien, T.-h. Kim, and S. Banerjee, "Linear correlation-based feature selection for network intrusion detection model," Advances in Security of Information and Communication Networks, Springer Berlin Heidelberg, vol.381, pp.240-248, 2013.
19 A. Frank and A. Asuncion, "UCI machine learning repository," 2010, http://archive.ics.uci.edu/ml
20 Abd-Eldayem and Mohamed M, "A proposed HTTP service based IDS," Egyptian Informatics Journal, vol.15, no.1, 13-24, 2014.   DOI