Browse > Article
http://dx.doi.org/10.3745/KIPSTD.2005.12D.3.345

An Efficient Subsequence Matching Method Based on Index Interpolation  

Loh Woong-Kee (한국과학기술원 전산학과)
Kim Sang-Wook (한양대학교 정보통신대학 정보통신학부)
Abstract
Subsequence matching is one of the most important operations in the field of data mining. The existing subsequence matching algorithms use only one index, and their performance gets worse as the difference between the length of a query sequence and the site of windows, which are subsequences of a same length extracted from data sequences to construct the index, increases. In this paper, we propose a new subsequence matching method based on index interpolation to overcome such a problem. An index interpolation method constructs two or more indexes, and performs search ing by selecting the most appropriate index among them according to the given query sequence length. In this paper, we first examine the performance trend with the difference between the query sequence length and the window size through preliminary experiments, and formulate a search cost model that reflects the distribution of query sequence lengths in the view point of the physical database design. Next, we propose a new subsequence matching method based on the index interpolation to improve search performance. We also present an algorithm based on the search cost formula mentioned above to construct optimal indexes to get better search performance. Finally, we verify the superiority of the proposed method through a series of experiments using real and synthesized data sets.
Keywords
Subsequence Matching; Index Interpolation; Window Size Effect; Physical Database Design; Search Cost Formula;
Citations & Related Records
연도 인용수 순위
  • Reference
1 D. Rafiei and A. Mendelzon, 'Similarity-based Queries for Time-Series Data,' In Proc. Int'l Conf. on Management of Data, ACM SIGMOD, pp.13-24, June, 1997   DOI
2 D. Rafiei, 'On Similarity-Based Queries for Time Series Data,' In Proc. Int'l Conf. on Data Engineering, IEEE ICDE, pp.410-417, Mar., 1999
3 R. Weber et al., 'A Quantitative Analysis and Performance Study for Similarity Search Methods on High-Dimensional Spaces,' In Proc. Int'l Conf. on Very Large Data Bases, VIDB, pp.194-205, Aug., 1998
4 D. Q. Goldin and P. C. Kanellakis, 'On Similarity Queries for Time-Series Data: Constraint Specification and Implementation,' In Proc. Int'l Conf. on Principles and Practice of Constraint Programming, pp.137-153, Sept., 1995
5 W. K. Loh et al., 'Index Interpolation: A Subsequence Matching Algorithm Supporting Moving Average Transform of Arbitrary Order in Time-Series Databases,' IEICE Transactions on Information and Systems, Vol.E84-D, No.1, pp.76-86, Jan., 2001
6 W. K. Loh et al., 'A Subsequence Matching Algorithm that Supports Normalization Transform in Time-Series Databases,' Data Mining and Knowledge Discovery, Vol. 9, No. 1, pp. 5-28, July 2004   DOI
7 Y. S. Moon et al., 'Duality-Based Subsequence Matching in Time-Series Databases,' In Proc. Int'l Conf. on Data Engineering, IEEE ICDE, pp.263-272, Apr., 2001   DOI
8 K. P. Chan and A. W. C. Fu, 'Efficient Time Series Matching by Wavelets,' In Proc. Int'l Conf. on Data Engineering, IEEE ICDE, pp.126-133, Mar., 1999   DOI
9 K. K. W. Chu and M. H. Wong, 'Fast Time-Series Searching with Scaling and Shifting,' In Proc. Int'l Symposium on Principles of Database Systems, ACM PODS, pp.237-248, May, 1999   DOI
10 M. S. Chen et al., 'Data Mining: An Overview from Database Perspective,' IEEE Trans. on Knowledge and Data Engineering, Vol.8, No.6, pp.866-883, June, 1996   DOI   ScienceOn
11 N. Beckmann et al., 'The R*-tree: An efficient and Robust Access Method for Points and Rectangles,' In Proc Int'l Conf. on Mamgement of data ACM SIGMOD, pp.322-331, May, 1990
12 C. Faloutsos et al., 'Fast Subsequence Matching in Time-series Databases,' In Proc Int'l Conf. on Management of Data, ACM SIGMOD, pp.419-429, May, 1994   DOI   ScienceOn
13 R. Agrawal et al., 'Efficient Similarity Search in Sequence DataBases,' In Proc. Int'l Conf. on Foundations of Data Organization and Algorithms, FODO, pp.69-84, Oct., 1993
14 R. Agrawal et al. 'Fast Similarity Search in the Presence of Noise, Scaling, and Translation in Time-Series Database,' In Proc. Int'l Conf. on Very Large Data Bases, VLDB, pp. 490-501, Sept., 1995
15 C. Chatfield, The Analysis of Time-Series: An Introduction, 3rd Ed., Chapman and Hall, 1984