Browse > Article

Comparisons of Practical Performance for Constructing Compressed Suffix Arrays  

Park, Chi-Seong (부산대학교 컴퓨터공학과)
Kim, Min-Hwan (부산대학교 컴퓨터공학과)
Lee, Suk-Hwan (동명대학교 정보보안학과)
Kwon, Ki-Ryong (부경대학교 컴퓨터공학과)
Kim, Dong-Kyue (한양대학교 전자통신컴퓨터공학부)
Abstract
Suffix arrays, fundamental full-text index data structures, can be efficiently used where patterns are queried many times. Although many useful full-text index data structures have been proposed, their O(nlogn)-bit space consumption motivates researchers to develop more space-efficient ones. However, their space efficient versions such as the compressed suffix array and the FM-index have been developed; those can not reduce the practical working space because their constructions are based on the existing suffix array. Recently, two direct construction algorithms of compressed suffix arrays from the text without constructing the suffix array have been proposed. In this paper, we compare practical performance of these algorithms of compressed suffix arrays with that of various algorithms of suffix arrays by measuring the construction times, the peak memory usages during construction and the sizes of their final outputs.
Keywords
full-text index data structure; suffix array; compressed suffix array; implementation issue; practical performance; odd-even scheme; skew scheme;
Citations & Related Records
연도 인용수 순위
  • Reference
1 D. E. Willard, 'Log-logarithmic worst-case range queries are possible in space {\Theta}(N)$ ,' Information Processing Letters, Vol.17, No.2, pp.81-84, 1983   DOI   ScienceOn
2 M. G. Maass 'Matching statistics: efficient computation and a new practical algorithm for the multiple common substring problem,' Software:Practices and Experience, Vol.36, No.3, pp.305-331, 2006   DOI
3 D. K. Kim, J. C. Na, J. E. Kim and K. Park, 'Efficient implementation of rank and select functions for succinct representation,' Proc. 4th Int. Workshop on Experimental and Efficient Algorithms, p.315-327, 2005   DOI   ScienceOn
4 T. Hagerup, P. B. Miltersen and R. Pagh, 'Deterministic dictionaries,' J. of Algorithms, Vol.41, No.1, pp.69-85, 2001   DOI   ScienceOn
5 R.M. Karp, R. E. Miller and A. L. Rosenberg, 'Rapid identification of repeated patterns in strings,' Proc. 4th ACM Symp. Theory of Computing, pp.125-136, 1972   DOI
6 H. Itoh and H. Tanaka, 'An efficient method for in memory construction of suffix array,' Proc. 11th Symp. String Processing and Information Retrieval, pp.81-88, 1999
7 J. C. Na, 'Linear-time construction of compressed suffix arrays using $O(nlog^{\varepsilon}$ n)-bit working space for large alphabets,' Proc. 16th Combinatorial Pattern Matching, pp.57-67, 2005
8 W. K. Hon, K. Sadakane and W. K. Sung, 'Breaking a time-and-space barrier in constructing full-text indices,' Proc. 44th IEEE Symp. Found. Computer Science, pp.251-260, 2003
9 R. Grossi and J. Vitter, 'Compressed suffix arrays and suffix trees with applications to text indexing and string matching,' Proc. 32nd ACM Symp. Theory of Computing, pp.397-406, 2000   DOI
10 P. Ferragina and G. Manzini, 'Opportunistic data structures with applications,' Proc. 41st IEEE Symp. Found. Computer Science, pp.390-398, 2001   DOI
11 G. Manzini and P. Ferragina, 'Engineering a lightweight suffix array construction algorithm,' Algorithmica, Vol.40, pp.33-50, 2004   DOI
12 K. Schurmann and J. Stoye, 'An incomplex algorithm for fast suffix array construction,' Software: Practices and Experience, 2006 (to appear)   DOI
13 P. Ko and S. Aluru, 'Space-efficient linear time construction of suffix arrays,' Proc. 14th Symp. Combinatorial Pattern Matching, pp.200-210, 2003
14 J. I. Munro, V. Raman and S. S. Rao, 'Space efficient suffix trees,' J. of Algorithms, Vol.39, pp.205-222, 2001   DOI   ScienceOn
15 D. K. Kim, J. Jo and H. Park, 'A fast algorithm for constructing suffix arrays for fixed-size alphabets,' Proc. 3rd Int. Workshop on Experimental and Efficient Algorithms, pp.301-314, 2004
16 N. J. Larsson and K. Sadakane, 'Faster Suffix Sorting,' Report. LU-CS-TR:99-214, Dept. of Computer Science, Lund University, Sweden, 1999
17 J. Kakkanen and P. Sanders, 'Simple linear work suffix array construction,' Proc. 30th Int. Colloq. Automata Languages and Programming, pp.943-955, 2003
18 D. K. Kim, J. S. Sim, H. Park and K. Park, 'Linear-time construction of suffix arrays,' Proc. 14th Symp. Combinatorial Pattern Matching, pp.186-199, 2003
19 U. Manber and G. Myers, 'Suffix arrays: A new method for on-line string searches,' SIAM J. Comput., Vol.22, No.5, pp.935-948, 1993   DOI   ScienceOn
20 D. Gusfield, 'An 'Increment-by-one' approach to suffix arrays and trees,' Report. CSE-90-39, Computer Science Division, University of California, Davis, 1990
21 M. Farach, P. Ferragina and S, Muthukrishnan, 'On the sorting-complexity of suffix tree construction,' J. Assoc. Comput. Mach. Vol.47, pp.987-1011, 2000   DOI   ScienceOn
22 E. M. McCreight, 'A space-economical suffix tree construction algorithm,' J. ACM., Vol.23, No.2, pp.262-272, 1976   DOI   ScienceOn
23 P. Weiner, 'Linear pattern matching algorithms,' Proc. 14th IEEE Symp. Switching and Automata Theory, pp.1-11, 1973
24 E. Ukkonen, 'On-line construction of suffix trees,' Algorithmica, Vol.14, pp.249-260, 1995   DOI
25 M. Farach, 'Optimal suffix tree construction with large alphabets,' Proc. 38th IEEE Symp. Found. Computer Science pp.137-143, 1997   DOI