Efficient Subsequence Searching in Sequence Databases : A Segment-based Approach

Park, Sang-Hyun;Kim, Sang-Wook;Loh, Woong-Kee;

한국정보과학회논문지:데이타베이스 (Journal of KIISE:Databases)

제28권3호
/
Pages.344-356
/
2001
/
1229-7739(pISSN)

한국정보과학회 (Korean Institute of Information Scientists and Engineers)

시퀀스 데이터베이스를 위한 서브시퀀스 탐색 : 세그먼트 기반 접근 방안

Efficient Subsequence Searching in Sequence Databases : A Segment-based Approach

박상현 ;
김상욱 (강원대학교 컴퓨터정보통신공학부) ;
노웅기 ((주) Tmax Soft)

Park, Sang-Hyun (IBM T. J. Watson Research Center) ;
Kim, Sang-Wook (Kangwon National University) ;
Loh, Woong-Kee

발행 : 2001.09.01

PDF

PDF 다운로드

⟨ 이전 논문 다음 논문 ⟩

초록

본 논문에서는 시퀀스 데이터베이스에서 시간왜곡 변환(time warping)을 지원하는 서브시퀀스 탐색 문제를 다룬다. 서브시퀀스 탐색은 데이터 시퀀스의 평균 길이의 이차 함수로 성능이 저하된다. 이러한 문제를 해결하기 위하여 본 논문에서는 세그먼트 기반 서브시퀀스 탐색 기법(Segment-Based Approach for Subsequence Searches : SBASS)을 제안한다. SBASS는 데이터와 질의 시퀀스를 연속된 세그먼트들로 분할하여 다음의 두가지 조건을 만족하는 모든 데이터 시퀀스를 검색한다. (1) 세그먼트의 개수가 질의 시퀀스의 세그먼트 개수와 같다. (2) 모든 세그먼트 쌍 간의 거리가 주어진 오차 한도 이내이다. 제안된 세그먼트 분할 기법에서는 세그먼트가 서로 다른 길이를 갖도록 허용하며, 세그먼트 쌀간의 유사성의 척도로서 시간왜곡 변환 거리를 이용한다. 효율적인 유사 서브시퀀스 탐색을 위하여, 각 데이터 세그먼트로부터 요서 값들이 단조적으로 변화하는 특성을 이용하여 특성 벡터를 추출하고, 추출된 특성 벡터를 이용하여 공간 인덱스를 생성한다. 질의는 이 인덱스를 이용하여 (1) R-트리 여과, (2) 특성 여과, (3) 순서 여과, (4) 후처리의 네 단계로 처리된다. 다양한 실험을 통하여 제안된 기법의 효율성을 입증한다.

This paper deals with the subsequence searching problem under time-warping in sequence databases. Our work is motivated by the observation that subsequence searches slow down quadratically as the average length of data sequences increases. To resolve this problem, the Segment-Based Approach for Subsequence Searches (SBSS) is proposed. The SBASS divides data and query sequences into a series of segments, and retrieves all data subsequences that satisfy the two conditions: (1) the number of segments is the same as the number of segments in a query sequence, and (2) the distance of every segment pair is less than or equal to a tolerance. Our segmentation scheme allows segments to have different lengths; thus we employ the time warping distance as a similarity measure for each segment pair. For efficient retrieval of similar subsequences, we extract feature vectors from all data segments exploiting their monotonically changing properties, and build a spatial index using feature vectors. Using this index, queries are processed with the four steps: (1) R-tree filtering, (2) feature filtering, (3) successor filtering, and (4) post-processing. The effectiveness of our approach is verified through extensive experiments.

키워드

참고문헌

R. Agrawal, C. Faloutsos, A. Swami, 'Efficient Similarity Search in Sequence Databases,' Proc. FODO, pp. 69-84, 1993 https://doi.org/10.1007/3-540-57301-1_5
C. Faloutsos, M. Ranganathan, Y. Manolopoulos, 'Fast Subsequence Matching in Time-Series Databases,' Proc. ACM SIGMOD, pp. 419-429, 1994 https://doi.org/10.1145/191843.191925
D. Q. Goldin, P. C. Kanellakis, 'On Similarity Queries for Time-Series Data: Constraint Specification and Implementation,' Proc. Constraint Programming, pp. 137-153, 1995
E. J. Keogh, M. J. Pazzani, 'Scaling up Dynamic Time Warping to Massive Datasets,' Proc. Principles and Practice of Knowledge Discovery in Databases, 1999
S. Park, D. Lee, W. W. Chu, 'Fast Retrieval of Similar Subsequences in Long Sequence Databases,' Proc. 3rd IEEE KDEX, pp. 60-67, 1999 https://doi.org/10.1109/KDEX.1999.836610
S. Park, W. W. Chu, J. Yoon, C. Hsu, 'Efficient Searches for Similar Subsequences of Different Lengths in Sequence Databases,' Proc. IEEE ICDE, pp. 23-32, 2000 https://doi.org/10.1109/ICDE.2000.839384
B.-K. Yi, H. V. Jagadish, C. Faloutsos, 'Efficient Retrieval of Similar Time Sequences Under Time Warping,' Proc.IEEE ICDE, pp. 201-208, 1998 https://doi.org/10.1109/ICDE.1998.655778
L. Rabiner, B.-H. Juang, Fundamentals of Speech Recognition, Prentice Hall, 1993
D. J. Berndt, J. Cliford, 'Finding Patterns in Time Series: A Dynamic Programming Approach,' Advances in Knowledge Discovery and Data Mining, AAAI/MIT, pp. 229-248, 1996
A. Guttman, 'R-trees : A Dynamic Index Structure for Spatial Searching,' Proc. ACM SIGMOD, pp. 47-57, 1984 https://doi.org/10.1145/602259.602266
P. Bieganski, J. Riedl, J. V. Carlis, 'Generalized Suffix Trees for Biological Sequence Data: Applications and Implementation,' Proc. Hawaii Int'l Conf. on System Sciences, 1994 https://doi.org/10.1109/HICSS.1994.323593
G. A. Stephen, String Searching Algorithms, World Scientific Publishing, 1994
T. Bozkaya, N. Yazdani, M. Ozsoyoglu, 'Matching and Indexing Sequences of Different Lengths,' Proc.ACM CIKM, pp. 128-135, 1997 https://doi.org/10.1145/266714.266880
C. Faloutsos, K. Lin, 'FastMap: A Fast Algorithm for Indexing, Data-Mining and Visualization of Traditional and Multimedia Datasets,' Proc. ACM SIGMOD, pp. 163-174, 1995 https://doi.org/10.1145/568271.223812

한국정보과학회논문지:데이타베이스 (Journal of KIISE:Databases)

시퀀스 데이터베이스를 위한 서브시퀀스 탐색 : 세그먼트 기반 접근 방안

Efficient Subsequence Searching in Sequence Databases : A Segment-based Approach

초록

키워드

참고문헌

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)