Linear-Time Search in Suffix Arrays

Sin Jeong SeoP;Kim Dong Kyue;Park Heejin;Park Kunsoo;

한국정보과학회논문지:시스템및이론 (Journal of KIISE:Computer Systems and Theory)

제32권5호
/
Pages.255-259
/
2005
/
1229-683X(pISSN)

한국정보과학회 (Korean Institute of Information Scientists and Engineers)

접미사 배열을 이용한 선형시간 탐색

Linear-Time Search in Suffix Arrays

심정섭 (인하대학교 컴퓨터공학부) ;
김동규 (부산대학교 컴퓨터공학부) ;
박희진 (한양대학교 컴퓨터공학부) ;
박근수 (서울대학교 컴퓨터공학부)

발행 : 2005.06.01

PDF KSCI

PDF 다운로드

⟨ 이전 논문 다음 논문 ⟩

초록

계산 생물학이나 문자열 연구 분야에 다양하게 웅용되는 패턴 탐색 문제에 접미사 트리와 접미사 배열과 같은 인덱스 자료구조가 널리 사용되어 왔다. 접미사 트리를 이용한 패턴 탐색이 접미사 배열을 이용한 탐색보다 시간 복잡도 관점에서 더 빠른 것으로 알려져 왔다. 즉, 상수 크기의 알파벳에 대해 패턴 P를 길이 n인 텍스트에서 탐색하기 위해 접미사 트리는 O(${\mid}P{\mid}$)시간이 필요한 반면 접미사 배열은 O(${\mid}P{\mid}+ logn$) 시간이 필요하다. 본 논문에서는 상수 크기 알파벳에 대해 접미사 배열을 이용한 선형시간 탐색 알고리즘을 제시한다. 본 알고리즘은 일반적인 알파벳 $\Sigma$에 대해서는 O(${\mid}P{\mid}log{\mid}{\Sigma$)시간이 필요하다.

To search a pattern P in a text, such index data structures as suffix trees and suffix arrays are widely used in diverse applications of string processing and computational biology. It is well known that searching in suffix trees is faster than suffix ways in the aspect of time complexity, i.e., it takes O(${\mid}P{\mid}$) time to search P on a constant-size alphabet in a suffix tree while it takes O(${\mid}P{\mid}+logn$) time in a suffix way where n is the length of the text. In this paper we present a linear-tim8 search algorithm in suffix arrays for constant-size alphabets. For a gene.al alphabet $\Sigma$, it takes O(${\mid}P{\mid}log{\mid}{\Sigma}{\mid}$) time.

키워드

참고문헌

E. M. McCreight, 'A space-economical suffix tree construction algorithms,' J. ACM 23, pp. 262-272, 1976 https://doi.org/10.1145/321941.321946
P. Weiner, Linear pattern matching algorithms, Proc. 14th IEEE Symp. Switching and Automata Theory, pp. 1-11, 1973
U. Manber, G. Myers, 'Suffix arrays: a new method for on-line string searches,' SIAM J. Computing 22, pp. 935-948, 1993 https://doi.org/10.1137/0222058
G. Gonnet, R. Baeza-Yates, and T. Snider, New indices for text: Pat trees and pat arrays. In W. B. Frakes and R. A. Baeza-Yates, editors, Information Retrieval: Data Structures & Algorithms, pp. 66-82. Prentice Hall, 1992
M. Farach-Colton, P. Ferragina and S. Muthukrishnan, On the sorting-complexity of suffix tree construction, J. Assoc. Comput. Mach, vol. 47, pp. 987-1011, 2000 https://doi.org/10.1145/355541.355547
D. Gusfield, Algorithms on Strings, Trees, and Sequences, Cambridge Univ. Press, 1997
D. Gusfield, An 'Increment-by-one' approach to suffix arrays and trees, manuscript, 1990
S. Burkhardt and J. Karkkainen, Fast lightweight suffix array construction and checking, Symp. Combinatorial Pattern Matching, LNCS 2676, pp. 55-69, 2003 https://doi.org/10.1007/3-540-44888-8_5
W. Hon, K. Sadakane, and W. Sung, Breaking a time-and-space barrier in constructing full-text indices, Proc. IEEE Symp. Found Computer Science, pp.251-260, 2003
J. Karkkainen and P. Sanders, Simple linear work suffix array construction, Int. Colloq. Automata Languages and Programming, LNCS 2719, pp. 943-955, 2003
D. Kim, J.S. Sim, H. Park, and K. Park, Linear-time construction of suffix arrays, Symp. Combinatorial Pattern Matching, LNCS 2676, pp. 186-199, 2003 https://doi.org/10.1007/3-540-44888-8_14
P. Ko and S. Aluru, Space efficient linear time construction of suffix arrays, Symp. Combinatorial Pattern Matching, LNCS 2676, pp. 200-210, 2003 https://doi.org/10.1007/3-540-44888-8_15
M. Farach, Optimal suffix tree construction with large alphabets, IEEE Symp. Found. Computer Science (1991), 137-143 https://doi.org/10.1109/SFCS.1997.646102
R. Hariharan, Optimal parallel suffix tree construction, J. Comput. Syst. Sci., vol. 55, pp. 44-69, 1997 https://doi.org/10.1006/jcss.1997.1496
M.I. Abouelhoda, E. Ohlebusch, and S. Kurtz, Optimal exact string matching based on suffix arrays, International Symposium on String Processing and Information Retrieval, LNCS 2476, 31-43, 2002
P. Ferragina and G.. Manzini, Opportunistic data structures with applications, IEEE Symp. Found Computer Science, 390-398, 2001 https://doi.org/10.1109/SFCS.2000.892127
K. Sadakane, Succinct representation of lcp information and improvement in the compressed suffixarrays, ACM-SIAM Symp. on Discrete Algorithms, pp. 225-232, 2002

한국정보과학회논문지:시스템및이론 (Journal of KIISE:Computer Systems and Theory)

접미사 배열을 이용한 선형시간 탐색

Linear-Time Search in Suffix Arrays

초록

키워드

참고문헌

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)