Scalable and Accurate Intrusion Detection using n-Gram Augmented Naive Bayes and Generalized k-Truncated Suffix Tree

Kang, Dae-Ki;Hwang, Gi-Hyun;

doi:10.6109/JKIICE.2009.13.4.805

Journal of the Korea Institute of Information and Communication Engineering (한국정보통신학회논문지)

Volume 13 Issue 4
/
Pages.805-812
/
2009
/
2234-4772(pISSN)
/
2288-4165(eISSN)

The Korea Institute of Information and Commucation Engineering (한국정보통신학회)

DOI QR Code

Scalable and Accurate Intrusion Detection using n-Gram Augmented Naive Bayes and Generalized k-Truncated Suffix Tree

N-그램 증강 나이브 베이스 알고리즘과 일반화된 k-절단 서픽스트리를 이용한 확장가능하고 정확한 침입 탐지 기법

강대기 (동서대학교 컴퓨터정보공학부) ;
황기현 (동서대학교 컴퓨터정보공학부)

Published : 2009.04.30

https://doi.org/10.6109/JKIICE.2009.13.4.805 Citation PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

In many intrusion detection applications, n-gram approach has been widely applied. However, n-gram approach has shown a few problems including unscalability and double counting of features. To address those problems, we applied n-gram augmented Naive Bayes with k-truncated suffix tree (k-TST) storage mechanism directly to classify intrusive sequences and compared performance with those of Naive Bayes and Support Vector Machines (SVM) with n-gram features by the experiments on host-based intrusion detection benchmark data sets. Experimental results on the University of New Mexico (UNM) benchmark data sets show that the n-gram augmented method, which solves the problem of independence violation that happens when n-gram features are directly applied to Naive Bayes (i.e. Naive Bayes with n-gram features), yields intrusion detectors with higher accuracy than those from Naive Bayes with n-gram features and shows comparable accuracy to those from SVM with n-gram features. For the scalable and efficient counting of n-gram features, we use k-truncated suffix tree mechanism for storing n-gram features. With the k-truncated suffix tree storage mechanism, we tested the performance of the classifiers up to 20-gram, which illustrates the scalability and accuracy of n-gram augmented Naive Bayes with k-truncated suffix tree storage mechanism.

기계 학습을 응용한 많은 침입 탐지 시스템들에서 n-그램 접근 방법이 사용되고 있다. 그러나, n-그램 접근방법은 확장이 어렵고, 주어진 시퀀스에서 획득한 n-그램들이 서로 겹치는 문제들을 가지고 있다. 본 연구에서는 이러한 문제들을 해결하기 위해, 일반화된 k-절단 서픽스트리 (generalized k-truncated suffix tree; k-TST) 기반의 n-그램 증강 나이브 베이스 (n-gram augmented naive Bayes) 알고리즘을 침입 시퀀스의 분류에 적용하여 보았다. 제 안된 시스템의 성능을 평가하기 위해 n-그램 특징들을 사용하는 일반 나이브 베이스 (naive Bayes) 알고리즘과 서포트 벡터 머신(support vector machines) 알고리즘과 본 연구에서 제안한 n-그램 증강 나이브 베이스 알고리즘을 호스트 기반 침입 탐지 벤치마크 데이터와 비교하였다. 공개된 호스트 기반 침입 탐지 벤치마크 데이터인 뉴 멕시코 대학(University of New Mexico)의 벤치마크 데이터에 적용해 본 결과에 따르면, n-그램 증강 방법이, n-그램이 나이브 베이스에 직접 적용되는 경우(예: n-그램 특징을 사용하는 일반 나이브 베이스), 생기는 독립성 가정에 대한 위배의 문제도 해결하면서, 동시에 더 정확한 침입 탐지기를 생성해냄을 알 수 있었다.

Keywords

References

E. Charniak, Statistical Language Learning, MIT Press, Cambridge, MA, USA, 1994
S. A. Hofmeyr, S. Forrest, and A. Somayaji, Intrusion detection using sequences of system calls, Journal of Computer Security, vol. 6, no. 3, pp. 151-180, 1998 https://doi.org/10.3233/JCS-980109
W. Lee, S. J. Stolfo, and K. W. Mok, A data mining framework for building intrusion detection models, in: IEEE Symposium on Security and Privacy, pp. 120-132, 1999
A. Murali and M. Rao, A survey on intrusion detection approaches, in: First International Conference on Information and Communication Technologies (ICICT 2005), pp. 233-240, 2005
K. Rieck and P. Laskov, Detecting unknown network attacks using language models., in: Proceedings of Third International Conference on Detection of Intrusions and Malware & Vulnerability Assessment (DIMVA 2006), Berlin, Germany, pp. 74-90, 2006
M. Z. Shafiq, S. A. Khayam, and M. Farooq, Embedded malware detection using markov n-grams., in: Proceedings of the Fifth Conference on Detection of Intrusions and Malware & Vulnerability Assessment (DIMVA 2008), 2008
F. Peng and D. Schuurmans, Combining naive Bayes and n-gram language models for text classification., in: F. Sebastiani (Ed.), Advances in Information Retrieval, 25th European Conference on IR Research (ECIR 2003), Vol. 2633 of Lecture Notes in Computer Science, Springer, pp. 335-350, 2003
C. Andorf, A. Silvescu, D. Dobbs, and V. Honavar, Learning classifiers for assigning protein sequences to gene ontology functional families, in: Proceedings of the Fifth International Conference on Knowledge Based Computer Systems (KBCS 2004), pp. 256-265, 2004
B. E. Boser, I. M. Guyon, and V. N. Vapnik. A training algorithm for optimal margin classifiers. In COLT '92: Proceedings of the fifth annual workshop on Computational learning theory, pages 144-152, New York, NY, USA, 1992
V. N. Vapnik. The nature of statistical learning theory. Springer-Verlag New York, Inc., New York, NY, USA, 1995
J. C. Na and K. Park, Data compression with truncated suffix trees Proceedings of Data Compression Conference 2000, p. 565, 2000
M. H. Schulz, S. Bauer, and P. N. Robinson, The generalised k-Truncated Suffix Tree for time-and space-efficient searches in multiple DNA or protein sequences International Journal of Bioinformatics Research and Applications, 4(1), pp. 81-95, 2008 https://doi.org/10.1504/IJBRA.2008.017165
T. M. Mitchell, Machine Learning McGraw-Hill, 1997
Y. Liao, and V. R. Vemuri, Using Text Categorization Techniques for Intrusion Detection Proceedings of the 11th USENIX Security Symposium, USENIX Association, 51-59, 2002
D. Kang, D. Fuller, and V. Honavar, Learning Classifiers for Misuse and Anomaly Detection Using a Bag of System Calls Representation Proceedings of 6th IEEE Systems Man and Cybernetics Information Assurance Workshop (IAW), 2005
A. Liu, C. Martin, T. Hetherington, and S. Matzner, A Comparison of System Call Feature Representations for Insider Threat Detection Proceedings of 6th IEEE Systems Man and Cybernetics Information Assurance Workshop (IAW), 2005
S. Forrest, A. S. Perelson, L. Allen, and R. Cherukuri, Self-Nonself Discrimination in a Computer SP '94: Proceedings of the 1994 IEEE Symposium on Security and Privacy, IEEE Computer Society, 202, 1994
W. Lee, and S. Stolfo, Data mining approaches for intrusion detection Proceedings of the 7th USENIX Security Symposium, 1998
C. Warrender, S. Forrest, and B. A. Pearlmutter, Detecting Intrusions using System Calls: Alternative Data Models IEEE Symposium on Security and Privacy, 133-145, 1999
R. G. Cowell, S. L. Lauritzen, A. P. David, D. J. Spiegelhalter, D. J. Spiegelhater, Probabilistic Networks and Expert Systems, Springer-Verlag New York, Inc., Secaucus, NJ, USA, 1999
E. Ukkonen, On-line construction of suffix-trees Algorithmica, 14, 249-260, 1995 https://doi.org/10.1007/BF01206331
D. Gusfield, Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology Cambridge University Press, 1997
K. M. C. Tan, and R. A. Maxion, "Why 6?" Defining the Operational Limits of STIDE, an Anomaly-Based Intrusion Detector Proceedings of the 2002 IEEE Symposium on Security and Privacy, IEEE Computer Society, 2002, 188

Journal of the Korea Institute of Information and Communication Engineering (한국정보통신학회논문지)

Scalable and Accurate Intrusion Detection using n-Gram Augmented Naive Bayes and Generalized k-Truncated Suffix Tree

N-그램 증강 나이브 베이스 알고리즘과 일반화된 k-절단 서픽스트리를 이용한 확장가능하고 정확한 침입 탐지 기법

Abstract

Keywords

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)