Browse > Article

The Performance Bottleneck of Subsequence Matching in Time-Series Databases: Observation, Solution, and Performance Evaluation  

김상욱 (한양대학교 정보통신대학 정보통신학부)
Abstract
Subsequence matching is an operation that finds subsequences whose changing patterns are similar to a given query sequence from time-series databases. This paper points out the performance bottleneck in subsequence matching, and then proposes an effective method that improves the performance of entire subsequence matching significantly by resolving the performance bottleneck. First, we analyze the disk access and CPU processing times required during the index searching and post processing steps through preliminary experiments. Based on their results, we show that the post processing step is the main performance bottleneck in subsequence matching, and them claim that its optimization is a crucial issue overlooked in previous approaches. In order to resolve the performance bottleneck, we propose a simple but quite effective method that processes the post processing step in the optimal way. By rearranging the order of candidate subsequences to be compared with a query sequence, our method completely eliminates the redundancy of disk accesses and CPU processing occurred in the post processing step. We formally prove that our method is optimal and also does not incur any false dismissal. We show the effectiveness of our method by extensive experiments. The results show that our method achieves significant speed-up in the post processing step 3.91 to 9.42 times when using a data set of real-world stock sequences and 4.97 to 5.61 times when using data sets of a large volume of synthetic sequences. Also, the results show that our method reduces the weight of the post processing step in entire subsequence matching from about 90% to less than 70%. This implies that our method successfully resolves th performance bottleneck in subsequence matching. As a result, our method provides excellent performance in entire subsequence matching. The experimental results reveal that it is 3.05 to 5.60 times faster when using a data set of real-world stock sequences and 3.68 to 4.21 times faster when using data sets of a large volume of synthetic sequences compared with the previous one.
Keywords
time-series databases; subsequence matching; performance analysis; index search; post-processing;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Y. S. Moon, K. Y. Whang, and W. K. Loh, 'Duality Based Subsequence Matching in Time Series Databases,' iN Proc. Int'l Conf. on Data Engineering, IEEE ICDE, pp. 263-272, 2001   DOI
2 Chen, M. S., Han, J., and Yu, P. S., 'Data Mining: An Overview from Database Perspective,' IEEE Trans. on Knowledge and Data Engineering, Vol. 8, No. 6, pp. 866-883, 1996   DOI   ScienceOn
3 D. Rafiei and A. Mendelzon, 'Similarity Based Queries for Time Series Data,' In Proc. Int'l. Conf. on Management of Data, ACM SIGMOD, pp. 13-24, 1997   DOI
4 K. K. W. Chu, and M. H. Wong, 'Fast Time Series Searching with Scaling and Shifting,' In Proc. Int'l. Symp. on Principles of Database Systems, ACM PODS, pp. 237-248, May 1999   DOI
5 D. Q. Goldin and P. C. Kanellakis, 'On Similarity Queries for Time Series Data: Constraint Specification and Implementation,' In Proc. Int'l. Conf. on Principles and Practice of Constraint Programming, CP, pp. 137-153, Sept. 1995   DOI   ScienceOn
6 K. P. Chan and A. W. C. Fu, 'Efficient Time Series Matching by Wavelets,' In Proc. Int'l. Conf. on Data Engineering, IEEE ICDE, pp. 126-133, 1999   DOI
7 D. Rafiei, 'On Similarity Based Queries for Time Series Data,' In Proc. Int'l. Conf. on Data Engineering, IEEE ICDE, pp. 410-417, 1999
8 S. H. Park et al., 'Efficient Searches for Similar Subsequences of Difference Lengths in Sequence Databases,' In Proc. Int'l. Conf. on Data Engineering, IEEE ICDE, pp. 23-32, 2000   DOI
9 B. K. Yi, H. V. Jagadish, and C. Faloutsos, 'Efficient Retrieval of Similar Time Sequences Under Time Warping,' In Proc. Int'l. Conf. on Data Engineering, IEEE ICDE, pp. 201-208, 1998   DOI
10 B. K.Yi and C. Faloutsos, 'Fast Time Sequence Indexing for Arbitrary Lp Norms,' In Proc. Int'l. Conf. on Very Large Data Bases, VLDB, pp. 385-394, 2000
11 D. J. Berndt, and J. Clifford, 'Finding Patterns in Time Series: A Dynamic Programming Approach,' Advances in Knowledge Discovery and Data Mining, pp. 229-248, 1996
12 S. W. Kim, S. H. Park, and W. W. Chu, 'An Index Based Approach for Similarity Search Supporting Time Warping in Large Sequence Databases,' In Proc. Int'l. Conf. on Data Engineering, IEEE ICDE, pp. 607-614, 2001   DOI
13 C. Chatfield, The Analysis of Time Series: An Introduction, 3rd Edition, Chapman and Hall, pp. 69-84, 1984
14 R. Weber, H. J. Schek, and S. Blott, 'A Quantitative Analysis and Performance Study for Similarity Search Methods in High-Dimensional Spaces,' In Proc. Int'l. Conf. on Very Large Data Bases, VLDB, pp. 194-205, 1998
15 N. Beckmann et al., 'The Rtree: An Efficient and Robust Access Method for Points and Rectangles,' In Proc. Int'l. Conf. on Management of Data, ACM SIGMOD, pp. 322-331, May 1990   DOI
16 S. Berchtold, D. A. Keim, and H. P. Kriegel, 'The X tree: An Index Structure for High -Dimensional Data,' In Proc Int'l. Conf. on Very Large Data Bases, VLDB, pp. 28-39, 1996
17 C. Faloutsos, M. Ranganathan, and Y. Mano lopoulos, 'Fast Subsequence Matching in Time series Databases,' In Proc. Int'l. Conf. on Management of Data, ACM SIGMOD, pp. 419-429, May 1994   DOI   ScienceOn
18 R. Agrawal, C. Faloutsos, and A. Swami, 'Efficient Similarity Search in Sequence Data bases,' In Proc. Int'l. Conf. on Foundations of Data Organization and Algorithms, FODO, pp. 69-84, Oct 1993
19 R. Agrawal et al., 'Fast Similarity Search in the Presence of Noise, Scaling, and Translation in Time Series Databases,' In Proc. Int'l. Conf. on Very Large Data Bases, VLDB, pp. 490-501, Sept. 1995
20 G. Das, D. Gunopulos, H. Mannila, 'Finding Similar Time Series, 'Proc. European Symp. on Principles of Data Mining and Knowledge Discovery, PKDD, pp. 88-100,1997   DOI   ScienceOn
21 P. G. Selinger et al., 'Access Path Selection in a Relational Database Management System,' In Proc. Int'l. Conf. on Management of Data, ACM SIGMOD, pp. 23-34, May 1979   DOI
22 W. K.L oh, S. W. Kim, and K. Y. Whang, 'Index Interpolation: An Approach for Subsequence Matching Supporting Normalization Transform in Time-Series Databases,' In Proc. ACM Int'l. Conf. on Information and Knowledge Management, ACM CIKM, pp. 480-487, 2000   DOI
23 W. K. Loh, S. W. Kim, and K. Y. Whang, 'Index Interpolation: A Subsequence Matching Algorithm Supporting Moving Average Transform of Arbitrary Order in Time-Series Databases,' IEICE Trans. on Information and Systems, Vol. E84-D, Nol. 1, pp. 76-86, 2001
24 S. H. Park, S. W. Kim, J. S. Cho, and S. Padmanabhan, 'Prefix-Querying: An Approach for Effective Subsequence Matching Under Time Warping in Sequence Databases,' In Proc. ACM Intl. Conf. on Information and Knowledge Management, ACM CIKM, pp. 255-262, 2001   DOI
25 S. W. Kim et al., Optimal Construction of a Multi-dimensional Index for Efficient Similarity Search, pp. 2002. (unpublished manuscript)