Browse > Article
http://dx.doi.org/10.6109/JKIICE.2009.13.4.805

Scalable and Accurate Intrusion Detection using n-Gram Augmented Naive Bayes and Generalized k-Truncated Suffix Tree  

Kang, Dae-Ki (동서대학교 컴퓨터정보공학부)
Hwang, Gi-Hyun (동서대학교 컴퓨터정보공학부)
Abstract
In many intrusion detection applications, n-gram approach has been widely applied. However, n-gram approach has shown a few problems including unscalability and double counting of features. To address those problems, we applied n-gram augmented Naive Bayes with k-truncated suffix tree (k-TST) storage mechanism directly to classify intrusive sequences and compared performance with those of Naive Bayes and Support Vector Machines (SVM) with n-gram features by the experiments on host-based intrusion detection benchmark data sets. Experimental results on the University of New Mexico (UNM) benchmark data sets show that the n-gram augmented method, which solves the problem of independence violation that happens when n-gram features are directly applied to Naive Bayes (i.e. Naive Bayes with n-gram features), yields intrusion detectors with higher accuracy than those from Naive Bayes with n-gram features and shows comparable accuracy to those from SVM with n-gram features. For the scalable and efficient counting of n-gram features, we use k-truncated suffix tree mechanism for storing n-gram features. With the k-truncated suffix tree storage mechanism, we tested the performance of the classifiers up to 20-gram, which illustrates the scalability and accuracy of n-gram augmented Naive Bayes with k-truncated suffix tree storage mechanism.
Keywords
N-그램 나이브 베이스 알고리즘;일반화된 k-절단 서픽스 트리;호스트 기반 침입 탐지;
Citations & Related Records
연도 인용수 순위
  • Reference
1 C. Warrender, S. Forrest, and B. A. Pearlmutter, Detecting Intrusions using System Calls: Alternative Data Models IEEE Symposium on Security and Privacy, 133-145, 1999
2 R. G. Cowell, S. L. Lauritzen, A. P. David, D. J. Spiegelhalter, D. J. Spiegelhater, Probabilistic Networks and Expert Systems, Springer-Verlag New York, Inc., Secaucus, NJ, USA, 1999
3 K. M. C. Tan, and R. A. Maxion, "Why 6?" Defining the Operational Limits of STIDE, an Anomaly-Based Intrusion Detector Proceedings of the 2002 IEEE Symposium on Security and Privacy, IEEE Computer Society, 2002, 188
4 S. A. Hofmeyr, S. Forrest, and A. Somayaji, Intrusion detection using sequences of system calls, Journal of Computer Security, vol. 6, no. 3, pp. 151-180, 1998   DOI   ScienceOn
5 M. Z. Shafiq, S. A. Khayam, and M. Farooq, Embedded malware detection using markov n-grams., in: Proceedings of the Fifth Conference on Detection of Intrusions and Malware & Vulnerability Assessment (DIMVA 2008), 2008
6 V. N. Vapnik. The nature of statistical learning theory. Springer-Verlag New York, Inc., New York, NY, USA, 1995
7 W. Lee, and S. Stolfo, Data mining approaches for intrusion detection Proceedings of the 7th USENIX Security Symposium, 1998
8 C. Andorf, A. Silvescu, D. Dobbs, and V. Honavar, Learning classifiers for assigning protein sequences to gene ontology functional families, in: Proceedings of the Fifth International Conference on Knowledge Based Computer Systems (KBCS 2004), pp. 256-265, 2004
9 D. Kang, D. Fuller, and V. Honavar, Learning Classifiers for Misuse and Anomaly Detection Using a Bag of System Calls Representation Proceedings of 6th IEEE Systems Man and Cybernetics Information Assurance Workshop (IAW), 2005
10 T. M. Mitchell, Machine Learning McGraw-Hill, 1997
11 S. Forrest, A. S. Perelson, L. Allen, and R. Cherukuri, Self-Nonself Discrimination in a Computer SP '94: Proceedings of the 1994 IEEE Symposium on Security and Privacy, IEEE Computer Society, 202, 1994
12 W. Lee, S. J. Stolfo, and K. W. Mok, A data mining framework for building intrusion detection models, in: IEEE Symposium on Security and Privacy, pp. 120-132, 1999
13 A. Murali and M. Rao, A survey on intrusion detection approaches, in: First International Conference on Information and Communication Technologies (ICICT 2005), pp. 233-240, 2005
14 E. Charniak, Statistical Language Learning, MIT Press, Cambridge, MA, USA, 1994
15 K. Rieck and P. Laskov, Detecting unknown network attacks using language models., in: Proceedings of Third International Conference on Detection of Intrusions and Malware & Vulnerability Assessment (DIMVA 2006), Berlin, Germany, pp. 74-90, 2006
16 Y. Liao, and V. R. Vemuri, Using Text Categorization Techniques for Intrusion Detection Proceedings of the 11th USENIX Security Symposium, USENIX Association, 51-59, 2002
17 E. Ukkonen, On-line construction of suffix-trees Algorithmica, 14, 249-260, 1995   DOI   ScienceOn
18 D. Gusfield, Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology Cambridge University Press, 1997
19 B. E. Boser, I. M. Guyon, and V. N. Vapnik. A training algorithm for optimal margin classifiers. In COLT '92: Proceedings of the fifth annual workshop on Computational learning theory, pages 144-152, New York, NY, USA, 1992
20 F. Peng and D. Schuurmans, Combining naive Bayes and n-gram language models for text classification., in: F. Sebastiani (Ed.), Advances in Information Retrieval, 25th European Conference on IR Research (ECIR 2003), Vol. 2633 of Lecture Notes in Computer Science, Springer, pp. 335-350, 2003
21 J. C. Na and K. Park, Data compression with truncated suffix trees Proceedings of Data Compression Conference 2000, p. 565, 2000
22 A. Liu, C. Martin, T. Hetherington, and S. Matzner, A Comparison of System Call Feature Representations for Insider Threat Detection Proceedings of 6th IEEE Systems Man and Cybernetics Information Assurance Workshop (IAW), 2005
23 M. H. Schulz, S. Bauer, and P. N. Robinson, The generalised k-Truncated Suffix Tree for time-and space-efficient searches in multiple DNA or protein sequences International Journal of Bioinformatics Research and Applications, 4(1), pp. 81-95, 2008   DOI   ScienceOn