Browse > Article

Efficient Time-Series Subsequence Matching Using MBR-Safe Property of Piecewise Aggregation Approximation  

Moon, Yang-Sae (강원대학교 컴퓨터학부 컴퓨터과학)
Abstract
In this paper we address the MBR-safe property of Piecewise Aggregation Approximation(PAA), and propose an of efficient subsequence matching method based on the MBR-safe PAA. A transformation is said to be MBR-safe if a low-dimensional MBR to which a high- dimensional MBR is transformed by the transformation contains every individual low-dimensional sequence to which a high-dimensional sequence is transformed. Using an MBR-safe transformation we can reduce the number of lower-dimensional transformations required in similar sequence matching, since it transforms a high-dimensional MBR itself to a low-dimensional MBR directly. Furthermore, PAA is known as an excellent lower-dimensional transformation single its computation is very simple, and its performance is superior to other transformations. Thus, to integrate these advantages of PAA and MBR-safeness, we first formally confirm the MBR-safe property of PAA, and then improve subsequence matching performance using the MBR-safe PAA. Contributions of the paper can be summarized as follows. First, we propose a PAA-based MBR-safe transformation, called mbrPAA, and formally prove the MBR-safeness of mbrPAA. Second, we propose an mbrPAA-based subsequence matching method, and formally prove its correctness of the proposed method. Third, we present the notion of entry reuse property, and by using the property, we propose an efficient method of constructing high-dimensional MBRs in subsequence matching. Fourth, we show the superiority of mbrPAA through extensive experiments. Experimental results show that, compared with the previous approach, our mbrPAA is 24.2 times faster in the low-dimensional MBR construction and improves subsequence matching performance by up to 65.9%.
Keywords
Time- series databases; MBR-safe transformation; piecewise aggregate approximation; subsequence matching;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Agrawal, R., Faloutsos, C., and Swami, A., 'Efficient Similarity Search in Sequence Databases,' In Proc. the 4th Int'l Conf. on Foundations of Data Organization and Algorithms, Chicago, Illinois, pp. 69-84, Oct. 1993
2 Moon, Y.-S., 'An MBR-Safe Transform for High-Dimensional MBRs in Similar Sequence Matching,' In Proc. Int'l Conf. on Database Systems for Advanced Applications (DASFAA2007), Bangkok, Thailand, pp. 79-90, Apr. 2007
3 Moon, Y.-S. and Kim, J., 'A Single Index Approach for Time-Series Subsequence Matching that Supports Moving Average Transform of Arbitrary Order,' In Proc. of the 10th Pacific-Asia Conf. on Knowledge Discovery and Data Mining (PAKDD 2006), Singapore, pp. 739-749, Apr. 2006
4 Keogh, E. J., Chu, S., and Pazzani, M. J., 'Ensemble-Index: A New Approach to Indexing Large Databases,' In Proc. of the 7th Int'l Conf. on Knowledge Discovery and Data Mining, ACM SIGKDD, San Francisco, CA, pp. 117-125, Aug. 2001
5 Loh, W.-K., Kim, S.-W., and Whang, K.-Y., 'A Subsequence Matching Algorithm that Supports Normalization Transform in Time-Series Databases,' Data Mining and Knowledge Discovery, Vol. 9, No. 1, pp. 5-28, July 2004   DOI   ScienceOn
6 Berchtold, S., Bohm, C., and Kriegel, H.-P., 'The Pyramid-Technique: Towards Breaking the Curse of Dimensionality,' In Proc. Int'l Conf. on Management of Data, ACM SIGMOD, Seattle, Washington, pp. 142-153, June 1998
7 Yi, B.-K. and Faloutsos, C., 'Fast Time Sequence Indexing for Arbitrary Lp Norms,' In Proc. of the 26th Int'l Conf. on Very Large Data Bases, Cairo, Egypt, pp. 385-394, Sept. 2000
8 Beckmann, N., Kriegel, H.-P., Schneider, R., and Seeger, B., 'The R*-tree: An Efficient and Robust Access Method for Points and Rectangles,' In Proc. Int'l Conf. on Management of Data, ACM SIGMOD, Atlantic City, New Jersey, pp. 322-331, May 1990
9 Lim, H.-S., Lee, J.-G., Lee, M.-J., Whang, K.-Y., and Song, I.-Y., 'Continuous Query Processing in Data Streams Using Duality of Data and Queries,' In Proc. Int'l Conf. on Management of Data, ACM SIGMOD, Chicago, Illinois, pp. 313-324, June 2006
10 Park, S., Chu, W. W., Yoon, J., and Won, J., 'Similarity Search of Time-Warped Subsequences via a Suffix Tree,' Information Systems, Vol. 28, No. 7, pp. 867-883, Oct. 2003   DOI   ScienceOn
11 Keogh, E. J. et al., 'LB_Keogh Supports Exact Indexing of Shapes under Rotation Invariance with Arbitrary Representations and Distance Measures,' In Proc. Int'l Conf. on Very Large Data Bases (VLDB), Seoul, Korea, pp. 882-893, Sept. 2006
12 Hsieh, M. J., Chen, M. S., and Yu, P. S., 'Integrating DCT and DWT for Approximating Cube Streams,' In Proc. of the 14th ACM Int'l Conf. on Information and Knowledge Management, Bremen, Germany, pp. 179-186, Oct. 2005
13 Chan, K.-P., Fu, A. W.-C., and Yu, C. T., 'Haar Wavelets for Efficient Similarity Search of Time-Series: With and Without Time Warping,' IEEE Trans. on Knowledge and Data Engineering, Vol. 15, No. 3, pp. 686-705, Jan./Feb. 2003   DOI   ScienceOn
14 Korn, F., Jagadish, H. V., and Faloutsos, C., 'Efficiently Supporting Ad Hoc Queries in Large Datasets of Time Sequences,' In Proc. of Int'l Conf. on Management of Data, ACM SIGMOD, Tucson, Arizona, pp. 289-300, June 1997
15 Lim, S.-H., Park, H.-J., and Kim, S.-W., 'Using Multiple Indexes for Efficient Subsequence Matching in Time-Series Databases,' In Proc. of the 11th Int'l Conf. on Database Systems for Advanced Applications (DASFAA2006), Singapore, pp. 65-79, Apr. 2006
16 Faloutsos, C., Ranganathan, M., and Manolopoulos, Y., 'Fast Subsequence Matching in Time-Series Databases,' In Proc. Int'l Conf. on Management of Data, ACM SIGMOD, Minneapolis, Minnesota, pp. 419-429, May 1994
17 Moon, Y.-S., Whang, K.-Y., and Han, W.-S., 'General Match: A Subsequence Matching Method in Time-Series Databases Based on Generalized Windows,' In Proc. Int'l Conf. on Management of Data, ACM SIGMOD, Madison, Wisconsin, pp. 382-393, June 2002
18 Rafiei, D. and Mendelzon, A. O., 'Querying Time Series Data Based on Similarity,' IEEE Trans. on Knowledge and Data Engineering, Vol. 12, No. 5, pp. 675-693, Sept./Oct. 2000   DOI   ScienceOn
19 Kim, S.-W., Yoon, J., Park, S., and Won, J.-I. 'Shape-based Retrieval in Time-Series Databases,' Journal of Systems and Software, Vol. 79, No. 2, pp. 191-203, Feb. 2006   DOI   ScienceOn
20 Wu, H., Salzberg, B., and Zhang, D., 'Online Event-driven Subsequence Matching Over Financial Data Streams,' In Proc. of Int'l Conf. on Management of Data, ACM SIGMOD, Paris, France, pp. 23-34, June 2004
21 Keogh, E. J. and Pazzani, M. J., 'A Simple Dimensionality Reduction Technique for Fast Similarity Search in Large Time Series Databases,' In Proc. of the 4th Pacific-Asia Conf. on Knowledge Discovery and Data Mining (PAKDD 2000), Kyoto, Japan, pp. 122-133, Apr. 2000
22 Yi, B.-K., Jagadish, H. V., and Faloutsos, C., 'Efficient Retrieval of Similar Time Sequences Under Time Warping,' In Proc. the 14th Int'l Conf. on Data Engineering(ICDE), IEEE, Orlando, Florida, pp. 201-208, Feb. 1998
23 Keogh, J., Chakrabarti, K., Pazzani, M. J., and Mehrotra, S., 'Dimensionality Reduction for Fast Similarity Search in Large Time Series Databases,' Knowledge and Information Systems, Vol. 3, No. 3, pp. 263-286, Aug. 2001   DOI   ScienceOn
24 Moon, Y.-S., Whang, K.-Y., and Loh, W.-K., 'Duality-Based Subsequence Matching in Time-Series Databases,' In Proc. the 17th Int'l Conf. on Data Engineering (ICDE), IEEE, Heidelberg, Germany, pp. 263-272, April 2001
25 Gao, L. and Wang, X. S., 'Continually Evaluating Similarity-based Pattern Queries on a Streaming Time Series,' In Proc. Int'l Conf. on Management of Data, ACM SIGMOD, Madison, Wisconsin, pp. 370-381, June 2002
26 Natsev, A., Rastogi, R., and Shim, K., 'WALRUS: A Similarity Retrieval Algorithm for Image Databases,' IEEE Trans. on Knowledge and Data Engineering, Vol. 16, No. 3, pp. 301-316, Mar. 2004   DOI   ScienceOn