Browse > Article

Fast Construction of Suffix Arrays for DNA Strings  

Jo, Jun-Ha ((주)소암시스템 연구소)
Kim, Nam-Hee (한양대학교 전자통신컴퓨터공학부)
Kwon, Ki-Ryong (부경대학교 컴퓨터공학과)
Kim, Dong-Kyue (한양대학교 전자통신컴퓨터공학부)
Abstract
To perform fast searching in massive data such as DNA strings, the most efficient method is to construct full-text index data structures of given strings. The widely used full-text index structures are suffix trees and suffix arrays. Since the suffix may uses less space than the suffix tree, the suffix array is proper for DNA strings. Previously developed construction algorithms of suffix arrays are not suitable for DNA strings since those are designed for integer alphabets. We propose a fast algorithm to construct suffix arrays on DNA strings whose alphabet sizes are fixed by 4. We reduce the construction time by improving encoding and merging steps on Kim et al.[1]'s algorithm. Experimental results show that our algorithm constructs suffix arrays on DNA strings 1.3-1.6 times faster than Kim et al.'s algorithm, and also for other algorithms in most cases.
Keywords
full-text index data structures; suffix arrays; DNA strings;
Citations & Related Records
연도 인용수 순위
  • Reference
1 D. Kim, J. Jo, H. Park, A fast algorithm for constructing suffix arrays for fixed-size alphabet, Workshop on Experimental and Efficient Algorithms, LNCS 3059, pp. 301-314, 2004
2 D. Gusfield, Algorithms on Strings, Trees, and Sequences, Cambridge Univ. Press, 1997
3 D. Gusfield, An 'Increase-by-one' approach to suffix arrays and trees, manuscript, 1990
4 N. Larsson and K. Sadakane, Faster suffix sorting, Manuscript, pp. 1-20, 1999
5 P. Ko, S. Aluru. Space-efficient linear time construction of suffix arrays, Journal of Discrete Algorithms, 3(2-4): pp. 143-156, 2005   DOI   ScienceOn
6 M. Farach-Colton, P. Ferragina and S. Muthukrishnan, On the sorting-complexity of suffix tree construction, J. Assoc. Comput. Mach., vol. 47, pp. 987-1011, 2000   DOI   ScienceOn
7 U. Manber and G. Myers, Suffix arrays: A new method for on-line string searches, SIAM J. Computing, vol 22, pp. 935-938, 1993   DOI   ScienceOn
8 E. Ukkonen, On-line construction of suffix trees, Algorithmica, vol. 14, pp. 249-260, 1995   DOI   ScienceOn
9 E. M. McCreight, A space-economical suffix tree construction algorithm, J. Assoc. Comput., vol. 23, pp. 262-272, 1976   DOI   ScienceOn
10 M. Farach, Optimal suffix tree construction with large alphabets, IEEE Symp. Found. Computer Science, pp. 137-143, 1997
11 J. Sim, D. Kim, H. Park and K. Park, Linear-time search in suffix arrays, Australasian Workshop on Combinatorial Algorithms, pp. 139-146, 2003
12 D. Kim, J. Sim, H. Park and K. Park, Linear-time construction of suffix arrays, Symp. Combinatorial Pattern Matching, LNCS 2676, pp. 186-199, 2003
13 J. Karkkainen and P. Sanders. Simple linear work suffix array construction. In Proc. 30th International Colloquium on Automata, Languages and Programming, LNCS 2719, pp. 943-955, 2003