Browse > Article
http://dx.doi.org/10.22156/CS4SMB.2022.12.02.047

Comparison of System Call Sequence Embedding Approaches for Anomaly Detection  

Lee, Keun-Seop (Dept. of Knowledge Information Engineering, Graduate School of Ajou University)
Park, Kyungseon (Dept. of Knowledge Information Engineering, Graduate School of Ajou University)
Kim, Kangseok (Dept. of Cyber Security, Ajou University)
Publication Information
Journal of Convergence for Information Technology / v.12, no.2, 2022 , pp. 47-53 More about this Journal
Abstract
Recently, with the change of the intelligent security paradigm, study to apply various information generated from various information security systems to AI-based anomaly detection is increasing. Therefore, in this study, in order to convert log-like time series data into a vector, which is a numerical feature, the CBOW and Skip-gram inference methods of deep learning-based Word2Vec model and statistical method based on the coincidence frequency were used to transform the published ADFA system call data. In relation to this, an experiment was carried out through conversion into various embedding vectors considering the dimension of vector, the length of sequence, and the window size. In addition, the performance of the embedding methods used as well as the detection performance were compared and evaluated through GRU-based anomaly detection model using vectors generated by the embedding model as an input. Compared to the statistical model, it was confirmed that the Skip-gram maintains more stable performance without biasing a specific window size or sequence length, and is more effective in making each event of sequence data into an embedding vector.
Keywords
IDS; Anomaly Detection; System Call; Embedding; GRU;
Citations & Related Records
연도 인용수 순위
  • Reference
1 S. A. Maske & T. J. Parvat. (2016. Aug.). Advanced anomaly intrusion detection technique for host based system using system call patterns. International Conference on Inventive Computation Technologies (ICICT). Coimbatore, India. DOI : 10.1109/INVENTIVE.2016.7824846   DOI
2 E. Aghaei. (2017). Machine learning for host-based misuse and anomaly detection in UNIX environment. Master Thesis, Computer Science in University of Toledo. DOI : 10.13140/RG.2.2.19382.73283   DOI
3 D. Kwon, K. Natarajan, S. C. Suh, H. Kim & J. Kim. (2018. July). An empirical study on network anomaly detection using convolutional neural networks. Proceedings of IEEE 38th International Conference Distributed Computing Systems(ICDCS), 1595-1598. DOI: 10.1109/ICDCS.2018.00178   DOI
4 Canadian Institute for Cybersecurit. (n. d.). NSL-KDD Dataset. UNB(Online). https://www.unb.ca/cic/datasets/nsl.html
5 T. Mikolov, K. Chen, G. Corrado & J. Dean. (2013). Efficient estimation of word representations in vector space. ICLR. arXiv:1301.3781v3. https://arxiv.org/pdf/1301.3781.pdf
6 T. Mikolov, I. Sutskever, K. Chen, G. Corrado & J. Dean. (2013). Distributed representations of words and phrases and their compositionality. Advances in Neural Information Processing Systems (NIPS). https://papers.nips.cc/paper/2013/file/9aa42b31882ec039965f3c4923ce901b-Paper.pdf
7 A. Vaswan, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser & I. Polosukhin. (2017). Attention is all you need. 31st Conference on Neural Information Processing Systems (NIPS). arXiv:1706.03762v5
8 G. Creech & J. Hu. (2013). Generation of a new IDS test dataset: Time to retire the KDD collection. IEEE WCNC(Wireless Communications and Networking Conference). DOI : 10.1109/WCNC.2013.6555301   DOI
9 Y. Fu, F. Lou, F. Meng, Z. Tian, H. Zhang & F. Jiang. (2018. June). An intelligent network attack detection method based on RNN. Proceedings of IEEE 3rd International Conference Data Science Cyberspace (DSC), 483-489. DOI : 10.1109/DSC.2018.00078   DOI
10 C. Kim, M. Jang, S. Seo, K. Park & P. Kang. (2021). Intrusion detection based on sequential information preserving log embedding methods and anomaly detection algorithms. IEEE Access, 9, 58088-58101. DOI : 10.1109/ACCESS.2021.3071763   DOI
11 B. Borisaniya & D. Patel. (2015). Evaluation of modified vector space representation using ADFA-LD and ADFA-WD datasets. Journal of Information Security, 6(3), 250-264. DOI : 10.4236/jis.2015.63025   DOI
12 K. Cho, B. V. Merrienboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk & Yoshua Bengio. (2014). Learning phrase Representations using RNN encoder-decoder for statistical machine translation. EMNLP, 1724-1734. arXiv:1406.1078. https://arxiv.org/pdf/1406.1078.pdf
13 M. Xie & J. Hu. (2013). Evaluating host-based anomaly detection systems: a preliminary analysis of ADFA-LD. 6th IEEE International Congress on Image and Signal Processing (CISP '03), 1711-1716. DOI : 10.1109/CISP.2013.6743952   DOI
14 G. Creech, & J. Hu. (2014). A semantic approach to host-based intrusion detection systems using contiguous and discontiguous system call patterns. IEEE Transactions on Computers, 63(4). DOI : 10.1109/TC.2013.13   DOI