DOI QR코드

DOI QR Code

Feature Selection Algorithm for Intrusions Detection System using Sequential Forward Search and Random Forest Classifier

  • Lee, Jinlee (Division of Computer Science and Engineering, Konkuk University) ;
  • Park, Dooho (Intelligent Service Development Team, XIIlab Co. Ltd) ;
  • Lee, Changhoon (Division of Computer Science and Engineering, Konkuk University)
  • Received : 2017.05.08
  • Accepted : 2017.08.29
  • Published : 2017.10.31

Abstract

Cyber attacks are evolving commensurate with recent developments in information security technology. Intrusion detection systems collect various types of data from computers and networks to detect security threats and analyze the attack information. The large amount of data examined make the large number of computations and low detection rates problematic. Feature selection is expected to improve the classification performance and provide faster and more cost-effective results. Despite the various feature selection studies conducted for intrusion detection systems, it is difficult to automate feature selection because it is based on the knowledge of security experts. This paper proposes a feature selection technique to overcome the performance problems of intrusion detection systems. Focusing on feature selection, the first phase of the proposed system aims at constructing a feature subset using a sequential forward floating search (SFFS) to downsize the dimension of the variables. The second phase constructs a classification model with the selected feature subset using a random forest classifier (RFC) and evaluates the classification accuracy. Experiments were conducted with the NSL-KDD dataset using SFFS-RF, and the results indicated that feature selection techniques are a necessary preprocessing step to improve the overall system performance in systems that handle large datasets. They also verified that SFFS-RF could be used for data classification. In conclusion, SFFS-RF could be the key to improving the classification model performance in machine learning.

Keywords

References

  1. C. Yin, L. Ma, L. Feng, Z. Yin and J. Wang, "A Feature Selection Algorithm towards Efficient Intrusion Detection," International Journal of Multimedia and Ubiquitous Engineering, vol.10, no.11, pp.253-264, 2015.
  2. S. Y. Ohn, S. D. Chi, and M. Y. Han, "Feature Selection for Classification of Mass Spectrometric Proteomic Data Using Random Forest," The Korea Society For Simulation(KSS), Vol.22, No.4, pp.139-147, 2013. https://doi.org/10.9709/JKSS.2013.22.4.139
  3. W. Lee and S. Oh, "Efficient Feature Selection Based Near Real-Time Hybrid Intrusion Detection System," KIPS Tr. Comp. and Comm. Sys., vol.5, no.12, pp.471-480, Dec. 2016. https://doi.org/10.3745/KTCCS.2016.5.12.471
  4. NSL-KDD Dataset [Internet], http://www.unb.ca/research/iscx/dataset/iscx-NSL-KDD-dataset.html.
  5. M. Tavallaee, E. Bagheri, W. Lu, and A.-A. Ghorbani, "A detailed analysis of the kdd cup 99 data set," Computational Intelligence for Security and Defense Applications, CISDA 2009. IEEE Symposium on. IEEE, pp.1-6, 2009.
  6. L. C. Molina, L. Belanche, and A. Nebot, "Feature selection algorithms: a survey and experimental evaluation," in Data Mining, ICDM 2003. Proceedings. 2002 IEEE International Conference on. IEEE. pp.306-313, 2002.
  7. G. CHANDRASHEKAR, F. SAHIN, "A survey on feature selection methods," Computers & Electrical Engineering, Vol.40, No.1, pp.16-28, 2014. https://doi.org/10.1016/j.compeleceng.2013.11.024
  8. L. Breiman, "Random forests," Machine learning, Vol.45, No.1, pp.5-32, 2001. https://doi.org/10.1023/A:1010933404324
  9. F. Baumann, A. Ehlers, K. Vogt, and B. Rosenhahn, "Cascaded Random Forest for Fast Object Detection," Scandinavian Conference on Image Analysis, Springer Berlin Heidelberg , pp. 131-142, 2013.
  10. Y. Mishina, R. Murata, Y. Yamauchi, T. Yamashita, and H. Fujiyoshi, "Boosted random forest," IEICE TRANSACTIONS on Information and Systems, Vol.98, No.9, pp.1630-1636, 2015.
  11. Ian H. Witten, Eibe Frank and Mark A. Hall, "Data Mining. 3rd," Trans. Lee. S. H, acorn, 2014.
  12. M. A. Hall, "Correlation-based Feature Subset Selection for Machine Learning," doctoral dissertation, The University of Waikato, Canada, 1999.
  13. H. Liu and R. Setiono, "A probabilistic approach to feature selection-A filter solution," in Proc. of 13th International Conference on Machine Learning, pp.319-327, 1996.
  14. Kakavand, M., Mustapha, N., Mustapha, A., and Abdullah, M. T., "Effective Dimensionality Reduction of Payload-Based Anomaly Detection in TMAD Model for HTTP Payload," KSII Transactions on Internet and Information Systems, Vol. 10, No.8, pp.3884-3910, 2016 https://doi.org/10.3837/tiis.2016.08.025
  15. Eid, H. F., Salama, M. A, Hassanien, A. E., and Kim, T. H, "Bi-layer behavioral-based feature selection approach for network intrusion classification," International Conference on Security Technology, Springer Berlin Heidelberg, vol. 259, pp.195-203, 2011.
  16. S. Mukherjee and N. Sharma, "Intrusion detection using naive Bayes classifier with feature reduction," Procedia Technology, vol.4, pp.119-128, 2012. https://doi.org/10.1016/j.protcy.2012.05.017
  17. H. F. Eid, A. E. Hassanien, T.-h. Kim, and S. Banerjee, "Linear correlation-based feature selection for network intrusion detection model," Advances in Security of Information and Communication Networks, Springer Berlin Heidelberg, vol.381, pp.240-248, 2013.
  18. E. de la Hoz, A. Ortiz, J. Ortega, and E. de la Hoz, "Network anomaly classification by support vector classifiers ensemble and non-linear projection techniques," International Conference on Hybrid Artificial Intelligence Systems, Springer Berlin Heidelberg, vol.8073, pp.103-111, 2013.
  19. Abd-Eldayem and Mohamed M, "A proposed HTTP service based IDS," Egyptian Informatics Journal, vol.15, no.1, 13-24, 2014. https://doi.org/10.1016/j.eij.2014.01.001
  20. A. Frank and A. Asuncion, "UCI machine learning repository," 2010, http://archive.ics.uci.edu/ml

Cited by

  1. Intrusion Detection System Modeling Based on Learning from Network Traffic Data vol.12, pp.11, 2018, https://doi.org/10.3837/tiis.2018.11.022
  2. Dimensionality reduction method for hyperspectral image analysis based on rough set theory vol.53, pp.1, 2020, https://doi.org/10.1080/22797254.2020.1785949
  3. 캠페인 효과 제고를 위한 자기 최적화 변수 선택 알고리즘 vol.26, pp.4, 2017, https://doi.org/10.13088/jiis.2020.26.4.173
  4. An intelligent flow-based and signature-based IDS for SDNs using ensemble feature selection and a multi-layer machine learning-based classifier vol.40, pp.3, 2017, https://doi.org/10.3233/jifs-200850
  5. A novel self-learning feature selection approach based on feature attributions vol.183, pp.None, 2017, https://doi.org/10.1016/j.eswa.2021.115219