DOI QR코드

DOI QR Code

High Performance Pattern Matching algorithm with Suffix Tree Structure for Network Security

네트워크 보안을 위한 서픽스 트리 기반 고속 패턴 매칭 알고리즘

  • Oh, Doohwan (Department of Electrical and Electronic Engineering, Yonsei University) ;
  • Ro, Won Woo (Department of Electrical and Electronic Engineering, Yonsei University)
  • 오두환 (연세대학교 전기전자공학과) ;
  • 노원우 (연세대학교 전기전자공학과)
  • Received : 2014.03.20
  • Accepted : 2014.05.29
  • Published : 2014.06.25

Abstract

Pattern matching algorithms are widely used in computer security systems such as computer networks, ubiquitous networks, sensor networks, and so on. However, the advances in information technology causes grow on the amount of data and increase on the computation complexity of pattern matching processes. Therefore, there is a strong demand for a novel high performance pattern matching algorithms. In light of this fact, this paper newly proposes a suffix tree based pattern matching algorithm. The suffix tree is constructed based on the suffix values of all patterns. Then, the shift nodes which informs how many characters can be skipped without matching operations are added to the suffix tree in order to boost matching performance. The proposed algorithm reduces memory usage on the suffix tree and the amount of matching operations by the shift nodes. From the performance evaluation, our algorithm achieved 24% performance gain compared with the traditional algorithm named as Wu-Manber.

패턴 매칭 알고리즘은 컴퓨터 네트워크, 유비쿼터스 네트워크, 그리고 센서 네트워크 등을 위한 보안 프로그램에 주로 사용 된다. IT 기술의 발전과 함께 정보의 디지털화가 가속화되면서 네트워크를 통해 전달되는 데이터양이 급증하고 있다. 이에 따라 패턴 매칭 연산의 복잡도도 폭발적으로 증가하고 있다. 따라서 더 많은 패턴을 보다 빠르게 검색할 수 있는 고성능 알고리즘의 개발이 끊임없이 요구되고 있다. 본 논문은 서픽스 트리 기반 패턴 매칭 알고리즘을 새롭게 제안하여 대용량 패턴 매칭 연산의 성능을 높였다. 서픽스 트리는 사전에 정의된 복수 패턴들의 서픽스를 기반으로 생성된다. 이 트리에 쉬프트 노드 개념을 추가하여 기존 패턴 매칭 연산들 중 불필요한 연산의 수행 횟수를 줄였다. 결과적으로 제안하는 구조를 통해 기존 알고리즘 대비 24% 이상의 성능 향상을 이루었다.

Keywords

References

  1. G. Navarro and M. Raffinot, "Flexible pattern matching in strings: practical on-line search algorithms for texts and biological sequences," Cambridge University Press, 2002.
  2. C. Charras and T. Lecroq, "Handbook of Exact String Matching Algorithms." King's College Publications, 2004.
  3. B. Phoophakdee and M. J. Zaki, "Genome-scale disk-based suffix tree indexing," in Proc. of ACM SIGMOD International Conference on Management of data (SIGMOD'07). New York, NY, USA: ACM, June 2007, pp. 833.844.
  4. Z. K. Baker and V. K. Prasanna, "A methodology for synthesis of efficient intrusion detection systems on FPGAs," in Proc. of IEEE Symposium on Field-Programmable Custom Computing Machines. Washington, DC, USA: IEEE Computer Society, April 2004, pp. 135.144.
  5. P.-C. Lin, Y.-D. Lin, T.-H. Lee, and Y.-C. Lai, "Using string matching for deep packet inspection," Computer, vol. 41, no. 4, pp. 23.28, 2008.
  6. N. B. Guinde and S. G. Ziavras, "Efficient hardware support for pattern matching in network intrusion detection," Computers & Security, vol. 29, no. 7, pp. 756.769, 2010. https://doi.org/10.1016/j.cose.2010.05.001
  7. Young Choi, Eun-Kyung Hong, Tae-Wan Kim, Seung-Tae Paek, Il-Hoon Choi, and Hyeong-Cheol Oh, "A Traffic Pattern Matching Hardware for a Contents Security System," The Institute of Electronics Engineers of Korea - Computer and Information, vol. 46, no. 1, pp.88.95, 2009.
  8. Z. Zhou, Y. Xue, J. Liu, W. Zhang, and J. Li, "MDH: a high speed multi-phase dynamic hash string matching algorithm for large-scale pattern set," in Proc. of the 9th international conference on Information and communications security, ser. ICICS'07. Berlin, Heidelberg: Springer-Verlag, 2007, pp.201.215.
  9. J. Ostell, "Databases of discovery," Queue, vol. 3, no. 3, pp. 40.48, 2005.
  10. Dae-Sung Kim and HyunJin Kim, "A heuristic for Mapping Patterns on Parallel String Matching Hardware in Deep Packet Inspection," IEEK Summer Conference, vol. 35, no. 1, pp.522.523, 2012.
  11. McAfee, "McAfee Labs Threats Report: Third Quarter 2013," Tech. Rep., November 2013.
  12. T. Kojm. (2012) Clam AntiVirus User Manual. [Online]. Available: http://www.clamav.net/doc/latest/clamdoc.pdf
  13. A. V. Aho and M. J. Corasick, "Efficient string matching: an aid to bibliographic search," Commun. ACM, vol. 18, pp.333.340, June 1975. https://doi.org/10.1145/360825.360855
  14. S. Wu and U. Manber, "A fast algorithm for multi-pattern searching," Department of Computer Science, University of Arizona, Tech. Rep. TR-94-17, 1994.
  15. R. S. Boyer and J. S. Moore, "A fast string searching algorithm," Commun. ACM, vol. 20, pp. 762.772, October 1977. [Online]. Available: http://doi.acm.org/10.1145/359842.359859
  16. P.-C. Lin, Y.-D. Lin, and Y.-C. Lai, "A hybrid algorithm of backward hashing and automaton tracking for virus scanning," Computers, IEEE Transactions on, vol. 60, no. 4, pp. 594.601, 2011. https://doi.org/10.1109/TC.2010.95