Browse > Article
http://dx.doi.org/10.6109/jkiice.2019.23.10.1311

Novelty Detection on Web-server Log Dataset  

Lee, Hwaseong (Agency of Defense and Development)
Kim, Ki Su (Agency of Defense and Development)
Abstract
Currently, the web environment is a commonly used area for sharing information and conducting business. It is becoming an attack point for external hacking targeting on personal information leakage or system failure. Conventional signature-based detection is used in cyber threat but signature-based detection has a limitation that it is difficult to detect the pattern when it is changed like polymorphism. In particular, injection attack is known to the most critical security risks based on web vulnerabilities and various variants are possible at any time. In this paper, we propose a novelty detection technique to detect abnormal state that deviates from the normal state on web-server log dataset(WSLD). The proposed method is a machine learning-based technique to detect a minor anomalous data that tends to be different from a large number of normal data after replacing strings in web-server log dataset with vectors using machine learning-based embedding algorithm.
Keywords
Web-server log dataset; Embedding; Novelty detection; Abnormal Behavior;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Symantec Corporation. 2016. Internet security threat report.
2 OWASP Top Ten Project, 2017 [Internet]. Available: https://www.owasp.org/index.php/Category:OWASP_Top_Ten_Project.
3 J. Liang, W. Zhao, and W. Ye, "Anomaly-Based Web Attack Detection: A Deep Learning Approach," the VI International Conference on Network, Communication and Computing. ACM, pp. 80-85, 2017.
4 H. Mac, D. Truong, L. Nguyen, H. A. Tran, and D. Tran, "Detecting Attacks on Web Applications using Autoencoder," the 9th Internationa Symposium on Information and Communication Technology, Viet Nam, pp. 416-421, 2018.
5 T. Mikolov, I. Sutskever, K. Chen, G. Corrando, and J. Dean, "Distirubuted representations of words and phrases and their compositionality." Advances in neural information processing systems, pp. 3111-3119, 2013.
6 Q. Le, "Distributed Representations of Sentences and Documents," International conference on machine learning, vol. 32, pp. 1188-1196, Jun. 2014.
7 F. T. Liu, K. M. Ting, and Z. Hua, "Isolation Forest," the 8th IEEE International Conference on Data Mining, pp. 413-422, 2008.
8 Gensim, Last updated on July, 2019. [Internet]. https://radimrehurek.com/gensim/models/doc2vec.html.
9 L. V. D. Maaten, and G. Hinton, "Visualizing Data using t-SNE," Journal of Machine Learning Research, vol. 9, pp. 2579-2695, 2008.
10 H. Lee, K. S. Kim, and H. Kim, "Embedding Model Based on Web-server Log Dataset," the Korea Institute of Military Science and Technology, pp.1183-1184, 2019.