DOI QR코드

DOI QR Code

Extraction of Network Threat Signatures Using Latent Dirichlet Allocation

LDA를 활용한 네트워크 위협 시그니처 추출기법

  • Lee, Sungil (Infrastructure Protection Division, National Security Research Institute) ;
  • Lee, Suchul (Dept. of Computer Science and Information Engineering, Korea National University of Transportation) ;
  • Lee, Jun-Rak (Dept. of Humanities and Social Sciences, Kangwon National University) ;
  • Youm, Heung-youl (Dept. of Information Security, Soonchunhyang University)
  • Received : 2017.07.23
  • Accepted : 2017.10.17
  • Published : 2018.02.28

Abstract

Network threats such as Internet worms and computer viruses have been significantly increasing. In particular, APTs(Advanced Persistent Threats) and ransomwares become clever and complex. IDSes(Intrusion Detection Systems) have performed a key role as information security solutions during last few decades. To use an IDS effectively, IDS rules must be written properly. An IDS rule includes a key signature and is incorporated into an IDS. If so, the network threat containing the signature can be detected by the IDS while it is passing through the IDS. However, it is challenging to find a key signature for a specific network threat. We first need to analyze a network threat rigorously, and write a proper IDS rule based on the analysis result. If we use a signature that is common to benign and/or normal network traffic, we will observe a lot of false alarms. In this paper, we propose a scheme that analyzes a network threat and extracts key signatures corresponding to the threat. Specifically, our proposed scheme quantifies the degree of correspondence between a network threat and a signature using the LDA(Latent Dirichlet Allocation) algorithm. Obviously, a signature that has significant correspondence to the network threat can be utilized as an IDS rule for detection of the threat.

인터넷 웜, 컴퓨터 바이러스 등 네트워크에 위협적인 악성트래픽이 증가하고 있다. 특히 최근에는 지능형 지속 위협 공격 (APT: Advanced Persistent Threat), 랜섬웨어 등 수법이 점차 고도화되고 그 복잡성(Complexity)이 증대되고 있다. 지난 몇 년간 침입탐지시스템(IDS: Intrusion Detection System)은 네트워크 보안 솔루션으로서 중추적 역할을 수행해왔다. 침입탐지시스템의 효과적 활용을 위해서는 탐지규칙(Rule)을 적절히 작성하여야 한다. 탐지규칙은 탐지하고자 하는 악성트래픽의 핵심 시그니처를 포함하며, 시그니처를 포함한 악성트래픽이 침입탐지시스템을 통과할 경우 해당 악성트래픽을 탐지하도록 한다. 그러나 악성트래픽의 핵심 시그니처를 찾는 일은 쉽지 않다. 먼저 악성트래픽에 대한 분석이 선행되어야 하며, 분석결과를 바탕으로 해당 악성트래픽에서만 발견되는 비트패턴을 시그니처로 사용해야 한다. 만약 정상 트래픽에서 흔히 발견되는 비트패턴을 시그니처로 사용하면 수많은 오탐(誤探)을 발생시키게 될 것이다. 본고에서는 네트워크 트래픽을 분석하여 핵심 시그니처를 추출하는 기법을 제안한다. 제안 기법은 LDA(Latent Dirichlet Allocation) 알고리즘을 활용하여, 어떠한 네트워크 트래픽에 포함된 시그니처가 해당 트래픽을 얼마나 대표하는지를 정량화한다. 대표성이 높은 시그니처는 해당 네트워크 트래픽을 탐지할 수 있는 침입탐지시스템의 탐지규칙으로 활용될 수 있다.

Keywords

References

  1. A. Wool, "A quantitative study of firewall configuration errors", Computer, vol. 37, no. 6, pp. 62-67, 2004. https://doi.org/10.1109/mc.2004.2
  2. Y. Qi, B. Yang, B. Xu, and J. Li, "Towards system-level optimization for high performance unified threat management," in proc. of IEEE ICNS 2007. https://doi.org/10.1109/icns.2007.126
  3. T. Krueger, C. Gehl, K. Rieck, and P. Laskov, "Tokdoc: A selfhealing web application firewall," in Proceedings of the 2010 ACM Symposium on Applied Computing. ACM, 2010, pp. 1846-1853. https://doi.org/10.1145/1774088.1774480
  4. X. Zhang, C. Li, and W. Zheng, "Intrusion prevention system design," in Computer and Information Technology, International Conference on. IEEE Computer Society, 2004, pp. 386-390. https://doi.org/10.1109/cit.2004.1357226
  5. SNORT, https://www.snort.org/.
  6. S. Lee et al. "LARGen: Automatic Signature Generation for Malwares Using Latent Dirichlet Allocation," IEEE Transactions on Dependable and Secure Computing (2016). https://doi.org/10.1109/tdsc.2016.2609907
  7. Hofmann, Thomas. "Probabilistic latent semantic analysis." Proceedings of the 15th conference on Uncertainty in artificial intelligence. Morgan Kaufmann Publishers Inc., 1999. http://www.iro.umontreal.ca/-nie/IFT6255/Hofmann-UAI99.pdf
  8. Blei, David M. "Probabilistic topic models." Communications of the ACM 55.4 (2012): 77-84. https://doi.org/10.1145/2133806.2133826
  9. D. M. Blei, A. Y. Ng, and M. I. Jordan, "Latent Dirichlet Allocation," the Journal of machine Learning research, vol. 3, pp. 993-1022, 2003. https://endymecy.gitbooks.io/spark-ml-source-analysis/content/%E8%81%9A%E7%B1%BB/LDA/docs/Latent%20Dirichlet%20Allocation.pdf
  10. J. Newsome, B. Karp, and D. Song, "Polygraph: Automatically generating signatures for polymorphic worms," in Security and Privacy, 2005 IEEE Symposium on. IEEE, 2005, pp. 226-241. https://doi.org/10.1109/sp.2005.15
  11. R. Perdisci, D. Dagon, W. Lee, P. Fogla, and M. Sharif, "Misleading worm signature generators using deliberate noise injection,"in Security and Privacy, 2006 IEEE Symposium on. IEEE, 2006. https://doi.org/10.1109/sp.2006.26
  12. M. M. Mohammed, H. A. Chan, and N. Ventura, "Honeycyber: Automated signature generation for zero-day polymorphic worms," in Military Communications Conference, 2008. MILCOM 2008. IEEE. IEEE, 2008, pp. 1-6. https://doi.org/10.1109/milcom.2008.4753178
  13. Jolliffe, Ian. Principal component analysis. John Wiley & Sons, Ltd, 2002. http://dx.doi.org/10.1007/b98835
  14. G. Tahan, C. Glezer, Y. Elovici, and L. Rokach, "Auto-sign: an automatic signature generator for high-speed malware filtering devices," Journal in computer virology, vol. 6, no. 2, pp. 91-103, 2010. https://doi.org/10.1007/s11416-009-0119-3
  15. A. Tongaonkar, R. Keralapura, and A. Nucci, "Santaclass: A self adaptive network traffic classification system," in IFIP Networking Conference, 2013. IEEE, 2013, pp. 1-9. http://ieeexplore.ieee.org/document/6663505/
  16. Z. Zhang, Z. Zhang, P. P. Lee, Y. Liu, and G. Xie, "Proword: An unsupervised approach to protocol feature word extraction," in INFOCOM, 2014 Proceedings IEEE. IEEE, 2014, pp. 1393-1401. https://doi.org/10.1109/infocom.2014.6848073
  17. H. J. Wang, C. Guo, D. R. Simon, and A. Zugenmaier, "Shield: Vulnerability-driven network filters for preventing known vulnerability exploits," ACM SIGCOMM 2004. https://doi.org/10.1145/1015467.1015489
  18. Z. Li, G. Xia, H. Gao, Y. Tang, Y. Chen, B. Liu, J. Jiang, and Y. Lv, "Netshield: massive semantics based vulnerability signature matching for high-speed networks," ACM SIGCOMM 2010. https://doi.org/10.1145/1851182.1851216
  19. T. L. Griffiths and M. Steyvers, "Finding scientific topics," Proceedings of the National academy of Sciences, vol. 101, no. suppl 1, pp. 5228-5235, 2004. http://psiexp.ss.uci.edu/research/papers/sciencetopics.pdf https://doi.org/10.1073/pnas.0307752101
  20. Sood, Aditya K., Richard J. Enbody, and Rohit Bansal. "Dissecting SpyEye-Understanding the design of third generation botnets." Computer Networks 57.2 (2013): 436-450. https://doi.org/10.1016/j.comnet.2012.06.021
  21. M. Parkour, "blog sobre comparticion de malware, recurso en l'inea disponible," 2014. http://contagiodump.blogspot.com/
  22. Netresec, "Capture files from Mid-Atlantic CCDC," http://www.netresec.com/?page=MACCDC, 2014.
  23. CAIDA. http://www.caida.org/home/
  24. G. Heinrich, "Parameter estimation for text analysis," in Technical Report. Fraunhofer IGD, Darmstadt, Germany, 2009. http://www.arbylon.net/publications/text-est2.pdf
  25. Kim, Hyunchul, et al. "Internet traffic classification demystified: myths, caveats, and the best practices." Proceedings of the 2008 ACM CoNEXT conference. ACM, 2008.https://doi.org/10.1145/1544012.1544023
  26. jpcap. http://jpcap.sourceforge.net/
  27. A Java Implementation of Latent Dirichlet Allocation (LDA) using Gibbs Sampling for Parameter Estimation and Inference. http://jgibblda.sourceforge.net/