DOI QR코드

DOI QR Code

Improving Lookup Time Complexity of Compressed Suffix Arrays using Multi-ary Wavelet Tree

  • Wu, Zheng (Department of Computer Science and Engineering, Pusan National University) ;
  • Na, Joong-Chae (Department of Computer Science and Engineering, Sejong University) ;
  • Kim, Min-Hwan (Department of Computer Science and Engineering, Pusan National University) ;
  • Kim, Dong-Kyue (Department of Electronics and Communication Engineering, Hanyang University)
  • 발행 : 2009.03.31

초록

In a given text T of size n, we need to search for the information that we are interested. In order to support fast searching, an index must be constructed by preprocessing the text. Suffix array is a kind of index data structure. The compressed suffix array (CSA) is one of the compressed indices based on the regularity of the suffix array, and can be compressed to the $k^{th}$ order empirical entropy. In this paper we improve the lookup time complexity of the compressed suffix array by using the multi-ary wavelet tree at the cost of more space. In our implementation, the lookup time complexity of the compressed suffix array is O(${\log}_{\sigma}^{\varepsilon/(1-{\varepsilon})}\;n\;{\log}_r\;\sigma$), and the space of the compressed suffix array is ${\varepsilon}^{-1}\;nH_k(T)+O(n\;{\log}\;{\log}\;n/{\log}^{\varepsilon}_{\sigma}\;n)$ bits, where a is the size of alphabet, $H_k$ is the kth order empirical entropy r is the branching factor of the multi-ary wavelet tree such that $2{\leq}r{\leq}\sqrt{n}$ and $r{\leq}O({\log}^{1-{\varepsilon}}_{\sigma}\;n)$ and 0 < $\varepsilon$ < 1/2 is a constant.

키워드

참고문헌

  1. FERRAGINA, P. and G. MANZINI. 2005. Index compressed texts. J. Assoc. Comput. Mach. 52(4):552-581. https://doi.org/10.1145/1082036.1082039
  2. FERRAGINA, P., G. MANZINI, V. MAKINEN, and G. NAVARRO. 2007. Compressed representation of sequences and full-text indexes. ACM Transactions on Algorithms (TALG), 3(2):1-25.
  3. GROSSI, R., A. GUPTA, and J. VITTER. 2003. High-order entropy-compressed text indexes. In Proceedings of the 14th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), 841-850.
  4. GROSSI, R. and J. VITTER. 2006. Compressed suffix arrays and suffix trees with applications to text indexing and string matching. SIAM J. Comput. 35(2):378-407. https://doi.org/10.1137/S0097539702402354
  5. MAKINEN, V. 2003. Compact suffix array - a space-efficient full-text index. Fund. Inform. 56(1-2):191-210.
  6. MAKINEN, V. and G. NAVARRO. 2004. Compressed compact suffix arrays. in Proceedings of the 15th Annual Symposium on Combinational Pattern Matching (CPM). Lecture Notes in Computer Science, vol. 3109. Springer-Verlag, Berlin, Germany, 420-433. https://doi.org/10.1007/978-3-540-27801-6_32
  7. MANBER, U. and G. MYERS. 1993. Suffix arrays: a new method for on-line string searches. SIAM J. Comput. 22(5):935-948. https://doi.org/10.1137/0222058
  8. MANZINI, G. 2001. An analysis of the burrows-wheeler transform. J. Assoc. Comput. Marc. 48(3):407-430. https://doi.org/10.1145/382780.382782
  9. MCCREIGHT, E. 1976. A space-economical suffix tree construction algorithm. J. Assoc. Comput. Marc. 23(2):262-272. https://doi.org/10.1145/321941.321946
  10. PAGH, R. 1999. Low redundancy in dictionaries with o(1) worst case lookup time. In Proceedings of the 26th International Colloquium on Automata, Languages and Programming (ICALP), 595-604. https://doi.org/10.1007/3-540-48523-6_56
  11. RAMAN, R., V. RAMAN, and S. RAO. 2002. Succinct indexable dictionaries with applications toencoding k-ary trees and multisets. In Proceedings of the 13th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), 233-242.
  12. SADAKANE, K. 2002. Succinct representations of lcp information and improvements in the compressed suffix arrays. In Proceedings of the 13th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), 225-232.
  13. SADAKANE, K. 2003. New text indexing functionalities of the compressed suffix arrays. J. Alg. 48(2):294-313. https://doi.org/10.1016/S0196-6774(03)00087-7

피인용 문헌

  1. Studies on ballistic impact of the composite panels vol.72, 2014, https://doi.org/10.1016/j.tafmec.2014.07.010
  2. A meshless adaptive multiscale method for fracture vol.96, 2015, https://doi.org/10.1016/j.commatsci.2014.08.054
  3. A self-organizing Lagrangian particle method for adaptive-resolution advection–diffusion simulations vol.231, pp.9, 2012, https://doi.org/10.1016/j.jcp.2012.01.026
  4. A 3D computational homogenization model for porous material and parameters identification vol.96, 2015, https://doi.org/10.1016/j.commatsci.2014.04.059
  5. Exact solutions of functionally graded piezoelectric material sandwich cylinders by a modified Pagano method vol.36, pp.5, 2012, https://doi.org/10.1016/j.apm.2011.07.077
  6. A state space differential reproducing kernel method for the 3D analysis of FGM sandwich circular hollow cylinders with combinations of simply-supported and clamped edges vol.94, pp.11, 2012, https://doi.org/10.1016/j.compstruct.2012.05.005
  7. RMVT-based meshless collocation and element-free Galerkin methods for the quasi-3D free vibration analysis of multilayered composite and FGM plates vol.93, pp.5, 2011, https://doi.org/10.1016/j.compstruct.2010.11.015