Efficient Time-Series Subsequence Matching Using MBR-Safe Property of Piecewise Aggregation Approximation

부분 집계 근사법의 MBR-안전 성질을 이용한 효율적인 시계열 서브시퀀스 매칭

  • 문양세 (강원대학교 컴퓨터학부 컴퓨터과학)
  • Published : 2007.12.15

Abstract

In this paper we address the MBR-safe property of Piecewise Aggregation Approximation(PAA), and propose an of efficient subsequence matching method based on the MBR-safe PAA. A transformation is said to be MBR-safe if a low-dimensional MBR to which a high- dimensional MBR is transformed by the transformation contains every individual low-dimensional sequence to which a high-dimensional sequence is transformed. Using an MBR-safe transformation we can reduce the number of lower-dimensional transformations required in similar sequence matching, since it transforms a high-dimensional MBR itself to a low-dimensional MBR directly. Furthermore, PAA is known as an excellent lower-dimensional transformation single its computation is very simple, and its performance is superior to other transformations. Thus, to integrate these advantages of PAA and MBR-safeness, we first formally confirm the MBR-safe property of PAA, and then improve subsequence matching performance using the MBR-safe PAA. Contributions of the paper can be summarized as follows. First, we propose a PAA-based MBR-safe transformation, called mbrPAA, and formally prove the MBR-safeness of mbrPAA. Second, we propose an mbrPAA-based subsequence matching method, and formally prove its correctness of the proposed method. Third, we present the notion of entry reuse property, and by using the property, we propose an efficient method of constructing high-dimensional MBRs in subsequence matching. Fourth, we show the superiority of mbrPAA through extensive experiments. Experimental results show that, compared with the previous approach, our mbrPAA is 24.2 times faster in the low-dimensional MBR construction and improves subsequence matching performance by up to 65.9%.

본 논문에서는 부분 집계 근사법(Piecewise Aggregation Approximation: PAA)이 MBR-안전(MBR-safe) 성질을 가짐을 보이고, 이를 사용한 효율적인 서브시퀀스 매칭 방법을 제안한다. MBR-안전 변환이란 고차원 MBR을 직접 변환한 저차원 MBR이 개별 고차원 시퀀스가 변환된 저차원 시퀀스를 모두 포함하는 변환을 의미한다. 이와 같은 MBR-안전 변환을 사용하면 고차원 MBR을 직접 저차원 MBR로 변환할 수 있어 유사 시퀀스 매칭에서 필요한 저차원 변환 횟수를 크게 줄일 수 있다. 또한, PAA는 계산이 간단하고 성능이 우수한 저차원 변환으로 알려져 있다. 이에 따라, 본 논문에서는 이들 두 개념의 장점을 통합하기 위하여, 기존의 PAA가 MBR-안전 성질을 가짐을 확인하고, 이를 사용하여 서브시퀀스 매칭의 성능을 개선한다. 본 논문의 공헌은 다음과 같다. 첫째, PAA 기반의 MBR 저차원 변환인 mbrPAA를 제안하고, mbrPAA가 MBR-안전함을 정형적으로 증명한다. 둘째, mbrPAA 기반의 새로운 서브시퀀스 매칭 방법을 제안하고, 이 방법의 정확성을 증명한다. 셋째, 서브시퀀스 매칭에서 엔트리 재사용 성질(entry reuse property)의 개념을 제시하고, 이 개념에 기반하여 고차원 MBR을 효율적으로 구성하는 방법을 제안한다. 넷째, 실험을 통해 mbrPAA의 우수성을 입증한다. 실험 결과, 제안한 mbrPAA는 기존 방법에 비해 저차원 MBR 구성을 평균 24.2배 빠르게 수행하고, 서브시퀀스 매칭 성능을 최대 65.9%까지 향상시킨 것으로 나타났다.

Keywords

References

  1. Agrawal, R., Faloutsos, C., and Swami, A., 'Efficient Similarity Search in Sequence Databases,' In Proc. the 4th Int'l Conf. on Foundations of Data Organization and Algorithms, Chicago, Illinois, pp. 69-84, Oct. 1993
  2. Faloutsos, C., Ranganathan, M., and Manolopoulos, Y., 'Fast Subsequence Matching in Time-Series Databases,' In Proc. Int'l Conf. on Management of Data, ACM SIGMOD, Minneapolis, Minnesota, pp. 419-429, May 1994
  3. Kim, S.-W., Yoon, J., Park, S., and Won, J.-I. 'Shape-based Retrieval in Time-Series Databases,' Journal of Systems and Software, Vol. 79, No. 2, pp. 191-203, Feb. 2006 https://doi.org/10.1016/j.jss.2005.05.004
  4. Wu, H., Salzberg, B., and Zhang, D., 'Online Event-driven Subsequence Matching Over Financial Data Streams,' In Proc. of Int'l Conf. on Management of Data, ACM SIGMOD, Paris, France, pp. 23-34, June 2004
  5. Moon, Y.-S., 'An MBR-Safe Transform for High-Dimensional MBRs in Similar Sequence Matching,' In Proc. Int'l Conf. on Database Systems for Advanced Applications (DASFAA2007), Bangkok, Thailand, pp. 79-90, Apr. 2007
  6. Keogh, E. J. et al., 'LB_Keogh Supports Exact Indexing of Shapes under Rotation Invariance with Arbitrary Representations and Distance Measures,' In Proc. Int'l Conf. on Very Large Data Bases (VLDB), Seoul, Korea, pp. 882-893, Sept. 2006
  7. Moon, Y.-S., Whang, K.-Y., and Han, W.-S., 'General Match: A Subsequence Matching Method in Time-Series Databases Based on Generalized Windows,' In Proc. Int'l Conf. on Management of Data, ACM SIGMOD, Madison, Wisconsin, pp. 382-393, June 2002
  8. Lim, S.-H., Park, H.-J., and Kim, S.-W., 'Using Multiple Indexes for Efficient Subsequence Matching in Time-Series Databases,' In Proc. of the 11th Int'l Conf. on Database Systems for Advanced Applications (DASFAA2006), Singapore, pp. 65-79, Apr. 2006
  9. Moon, Y.-S. and Kim, J., 'A Single Index Approach for Time-Series Subsequence Matching that Supports Moving Average Transform of Arbitrary Order,' In Proc. of the 10th Pacific-Asia Conf. on Knowledge Discovery and Data Mining (PAKDD 2006), Singapore, pp. 739-749, Apr. 2006
  10. Moon, Y.-S., Whang, K.-Y., and Loh, W.-K., 'Duality-Based Subsequence Matching in Time-Series Databases,' In Proc. the 17th Int'l Conf. on Data Engineering (ICDE), IEEE, Heidelberg, Germany, pp. 263-272, April 2001
  11. Chan, K.-P., Fu, A. W.-C., and Yu, C. T., 'Haar Wavelets for Efficient Similarity Search of Time-Series: With and Without Time Warping,' IEEE Trans. on Knowledge and Data Engineering, Vol. 15, No. 3, pp. 686-705, Jan./Feb. 2003 https://doi.org/10.1109/TKDE.2003.1198399
  12. Keogh, J., Chakrabarti, K., Pazzani, M. J., and Mehrotra, S., 'Dimensionality Reduction for Fast Similarity Search in Large Time Series Databases,' Knowledge and Information Systems, Vol. 3, No. 3, pp. 263-286, Aug. 2001 https://doi.org/10.1007/PL00011669
  13. Keogh, E. J., Chu, S., and Pazzani, M. J., 'Ensemble-Index: A New Approach to Indexing Large Databases,' In Proc. of the 7th Int'l Conf. on Knowledge Discovery and Data Mining, ACM SIGKDD, San Francisco, CA, pp. 117-125, Aug. 2001
  14. Loh, W.-K., Kim, S.-W., and Whang, K.-Y., 'A Subsequence Matching Algorithm that Supports Normalization Transform in Time-Series Databases,' Data Mining and Knowledge Discovery, Vol. 9, No. 1, pp. 5-28, July 2004 https://doi.org/10.1023/B:DAMI.0000026902.89522.a3
  15. Berchtold, S., Bohm, C., and Kriegel, H.-P., 'The Pyramid-Technique: Towards Breaking the Curse of Dimensionality,' In Proc. Int'l Conf. on Management of Data, ACM SIGMOD, Seattle, Washington, pp. 142-153, June 1998
  16. Keogh, E. J. and Pazzani, M. J., 'A Simple Dimensionality Reduction Technique for Fast Similarity Search in Large Time Series Databases,' In Proc. of the 4th Pacific-Asia Conf. on Knowledge Discovery and Data Mining (PAKDD 2000), Kyoto, Japan, pp. 122-133, Apr. 2000
  17. Yi, B.-K. and Faloutsos, C., 'Fast Time Sequence Indexing for Arbitrary Lp Norms,' In Proc. of the 26th Int'l Conf. on Very Large Data Bases, Cairo, Egypt, pp. 385-394, Sept. 2000
  18. Yi, B.-K., Jagadish, H. V., and Faloutsos, C., 'Efficient Retrieval of Similar Time Sequences Under Time Warping,' In Proc. the 14th Int'l Conf. on Data Engineering(ICDE), IEEE, Orlando, Florida, pp. 201-208, Feb. 1998
  19. Rafiei, D. and Mendelzon, A. O., 'Querying Time Series Data Based on Similarity,' IEEE Trans. on Knowledge and Data Engineering, Vol. 12, No. 5, pp. 675-693, Sept./Oct. 2000 https://doi.org/10.1109/69.877502
  20. Park, S., Chu, W. W., Yoon, J., and Won, J., 'Similarity Search of Time-Warped Subsequences via a Suffix Tree,' Information Systems, Vol. 28, No. 7, pp. 867-883, Oct. 2003 https://doi.org/10.1016/S0306-4379(02)00102-3
  21. Beckmann, N., Kriegel, H.-P., Schneider, R., and Seeger, B., 'The R*-tree: An Efficient and Robust Access Method for Points and Rectangles,' In Proc. Int'l Conf. on Management of Data, ACM SIGMOD, Atlantic City, New Jersey, pp. 322-331, May 1990
  22. Hsieh, M. J., Chen, M. S., and Yu, P. S., 'Integrating DCT and DWT for Approximating Cube Streams,' In Proc. of the 14th ACM Int'l Conf. on Information and Knowledge Management, Bremen, Germany, pp. 179-186, Oct. 2005
  23. Korn, F., Jagadish, H. V., and Faloutsos, C., 'Efficiently Supporting Ad Hoc Queries in Large Datasets of Time Sequences,' In Proc. of Int'l Conf. on Management of Data, ACM SIGMOD, Tucson, Arizona, pp. 289-300, June 1997
  24. Gao, L. and Wang, X. S., 'Continually Evaluating Similarity-based Pattern Queries on a Streaming Time Series,' In Proc. Int'l Conf. on Management of Data, ACM SIGMOD, Madison, Wisconsin, pp. 370-381, June 2002
  25. Lim, H.-S., Lee, J.-G., Lee, M.-J., Whang, K.-Y., and Song, I.-Y., 'Continuous Query Processing in Data Streams Using Duality of Data and Queries,' In Proc. Int'l Conf. on Management of Data, ACM SIGMOD, Chicago, Illinois, pp. 313-324, June 2006
  26. Natsev, A., Rastogi, R., and Shim, K., 'WALRUS: A Similarity Retrieval Algorithm for Image Databases,' IEEE Trans. on Knowledge and Data Engineering, Vol. 16, No. 3, pp. 301-316, Mar. 2004 https://doi.org/10.1109/TKDE.2003.1262183