[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.22156/CS4SMB.2022.12.02.047

Comparison of System Call Sequence Embedding Approaches for Anomaly Detection

Lee, Keun-Seop (Dept. of Knowledge Information Engineering, Graduate School of Ajou University)
Park, Kyungseon (Dept. of Knowledge Information Engineering, Graduate School of Ajou University)
Kim, Kangseok (Dept. of Cyber Security, Ajou University)

Publication Information

Journal of Convergence for Information Technology / v.12, no.2, 2022 , pp. 47-53 More about this Journal

Abstract

Recently, with the change of the intelligent security paradigm, study to apply various information generated from various information security systems to AI-based anomaly detection is increasing. Therefore, in this study, in order to convert log-like time series data into a vector, which is a numerical feature, the CBOW and Skip-gram inference methods of deep learning-based Word2Vec model and statistical method based on the coincidence frequency were used to transform the published ADFA system call data. In relation to this, an experiment was carried out through conversion into various embedding vectors considering the dimension of vector, the length of sequence, and the window size. In addition, the performance of the embedding methods used as well as the detection performance were compared and evaluated through GRU-based anomaly detection model using vectors generated by the embedding model as an input. Compared to the statistical model, it was confirmed that the Skip-gram maintains more stable performance without biasing a specific window size or sequence length, and is more effective in making each event of sequence data into an embedding vector.

Keywords

IDS; Anomaly Detection; System Call; Embedding; GRU;

Citations & Related Records

Reference

1	S. A. Maske & T. J. Parvat. (2016. Aug.). Advanced anomaly intrusion detection technique for host based system using system call patterns. International Conference on Inventive Computation Technologies (ICICT). Coimbatore, India. DOI : 10.1109/INVENTIVE.2016.7824846 DOI
2	E. Aghaei. (2017). Machine learning for host-based misuse and anomaly detection in UNIX environment. Master Thesis, Computer Science in University of Toledo. DOI : 10.13140/RG.2.2.19382.73283 DOI
3	D. Kwon, K. Natarajan, S. C. Suh, H. Kim & J. Kim. (2018. July). An empirical study on network anomaly detection using convolutional neural networks. Proceedings of IEEE 38th International Conference Distributed Computing Systems(ICDCS), 1595-1598. DOI: 10.1109/ICDCS.2018.00178 DOI
4	Canadian Institute for Cybersecurit. (n. d.). NSL-KDD Dataset. UNB(Online). https://www.unb.ca/cic/datasets/nsl.html
5	T. Mikolov, K. Chen, G. Corrado & J. Dean. (2013). Efficient estimation of word representations in vector space. ICLR. arXiv:1301.3781v3. https://arxiv.org/pdf/1301.3781.pdf
6	T. Mikolov, I. Sutskever, K. Chen, G. Corrado & J. Dean. (2013). Distributed representations of words and phrases and their compositionality. Advances in Neural Information Processing Systems (NIPS). https://papers.nips.cc/paper/2013/file/9aa42b31882ec039965f3c4923ce901b-Paper.pdf
7	A. Vaswan, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser & I. Polosukhin. (2017). Attention is all you need. 31st Conference on Neural Information Processing Systems (NIPS). arXiv:1706.03762v5
8	G. Creech & J. Hu. (2013). Generation of a new IDS test dataset: Time to retire the KDD collection. IEEE WCNC(Wireless Communications and Networking Conference). DOI : 10.1109/WCNC.2013.6555301 DOI
9	Y. Fu, F. Lou, F. Meng, Z. Tian, H. Zhang & F. Jiang. (2018. June). An intelligent network attack detection method based on RNN. Proceedings of IEEE 3rd International Conference Data Science Cyberspace (DSC), 483-489. DOI : 10.1109/DSC.2018.00078 DOI
10	C. Kim, M. Jang, S. Seo, K. Park & P. Kang. (2021). Intrusion detection based on sequential information preserving log embedding methods and anomaly detection algorithms. IEEE Access, 9, 58088-58101. DOI : 10.1109/ACCESS.2021.3071763 DOI
11	B. Borisaniya & D. Patel. (2015). Evaluation of modified vector space representation using ADFA-LD and ADFA-WD datasets. Journal of Information Security, 6(3), 250-264. DOI : 10.4236/jis.2015.63025 DOI
12	K. Cho, B. V. Merrienboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk & Yoshua Bengio. (2014). Learning phrase Representations using RNN encoder-decoder for statistical machine translation. EMNLP, 1724-1734. arXiv:1406.1078. https://arxiv.org/pdf/1406.1078.pdf
13	M. Xie & J. Hu. (2013). Evaluating host-based anomaly detection systems: a preliminary analysis of ADFA-LD. 6th IEEE International Congress on Image and Signal Processing (CISP '03), 1711-1716. DOI : 10.1109/CISP.2013.6743952 DOI
14	G. Creech, & J. Hu. (2014). A semantic approach to host-based intrusion detection systems using contiguous and discontiguous system call patterns. IEEE Transactions on Computers, 63(4). DOI : 10.1109/TC.2013.13 DOI

KSCI

Comparison of System Call Sequence Embedding Approaches for Anomaly Detection 이상 탐지를 위한 시스템콜 시퀀스 임베딩 접근 방식 비교

Comparison of System Call Sequence Embedding Approaches for Anomaly Detection