NBR-Safe Transform: Lower-Dimensional Transformation of High-Dimensional MBRs in Similar Sequence Matching

MBR-Safe 변환 : 유사 시퀀스 매칭에서 고차원 MBR의 저차원 변환

  • 문양세 (강원대학교 컴퓨터학부 컴퓨터과학)
  • Published : 2006.12.15

Abstract

To improve performance using a multidimensional index in similar sequence matching, we transform a high-dimensional sequence to a low-dimensional sequence, and then construct a low-dimensional MBR that contains multiple transformed sequences. In this paper we propose a formal method that transforms a high-dimensional MBR itself to a low-dimensional MBR, and show that this method significantly reduces the number of lower-dimensional transformations. To achieve this goal, we first formally define the new notion of MBR-safe. We say that a transform is MBR-safe if a low-dimensional MBR to which a high-dimensional MBR is transformed by the transform contains every individual low-dimensional sequence to which a high-dimensional sequence is transformed. We then propose two MBR-safe transforms based on DFT and DCT, the most representative lower-dimensional transformations. For this, we prove the traditional DFT and DCT are not MBR-safe, and define new transforms, called mbrDFT and mbrDCT, by extending DFT and DCT, respectively. We also formally prove these mbrDFT and mbrDCT are MBR-safe. Moreover, we show that mbrDFT(or mbrDCT) is optimal among the DFT-based(or DCT-based) MBR-safe transforms that directly convert a high-dimensional MBR itself into a low-dimensional MBR. Analytical and experimental results show that the proposed mbrDFT and mbrDCT reduce the number of lower-dimensional transformations drastically, and improve performance significantly compared with the $na\"{\i}ve$ transforms. These results indicate that our MBR- safe transforms provides a useful framework for a variety of applications that require the lower-dimensional transformation of high-dimensional MBRs.

대부분의 유사 시퀀스 매칭 방법은 다차원 색인을 사용한 검색 속도의 향상을 위해, 많은 수의 고차원 시퀀스를 저차윈 변환한 후 이들 변환된 시퀀스들을 포함하는 저차원 MBR을 구성한다. 본 논문에서는 고차원 MBR자체를 직접 저차원 MBR로 변환하는 정형적인 방법을 제안하고, 이를 사용하면 유사 시퀀스 매칭에서 필요한 저차원 변환 횟수를 획기적으로 줄일 수 있음을 보인다. 이를 위해, 우선 변환의 MBR-safe 개념을 정형적으로 제안한다. 어떤 변환이 MBR-safe하다 함은 고차원 MBR을 직접 변환한 저차원 MBR이 개별 고차원 시퀀스가 변환된 저차원 시퀀스를 모두 포함함을 의미한다. 다음으로, 기존 저차원 변환 중에서 가장 널리 사용되는 DFT와 DCT에 대해 각각 MBR-safe 변환을 제안한다. 먼저, 기존 DFT와 DCT가 MBR-safe하지 않음을 보이고, DFT와 DCT를 확장한 mbrDFT와 mbrDCT를 각각 정의한다. 그리고, 이들 mbrDFT와 mbrDCT가 MBR-safe함을 정형적으로 증명한다. 또한, mbrDFT(흑은 mbrDCT)가 고차원 MBR을 저차원 MBR로 직접 변환하는 DFT(혹은 DCT) 기반의 최적 MBR-safe 변환임을 증명한다. 분석과 실험 결과, 제안한 mbrDFT 및 mbrDCT를 사용하면 저차원 변환 횟수를 획기적으로 줄이고 성능을 크게 향상 시킨 것으로 나타났다. 이 같은 결과를 볼 때, 본 논문에서 제시한 MBR-safe 개념은 고차원 MBR의 저차원 변환이 필요한 많은 응용에 활용될 수 있는 유용한 연구 결과라 사료된다.

Keywords

References

  1. Agrawal, R., Faloutsos, C., and Swami, A., 'Efficient Similarity Search in Sequence Databases,' In Proc. the 4th Int'l Conf. on Foundations of Data Organization and Algorithms, Chicago, Illinois, pp. 69-84, Oct. 1993
  2. Faloutsos, C., Ranganathan, M., and Manolopoulos, Y, 'Fast Subsequence Matching in Time-Series Databases,' In Proc. Int'l Conf. on Management of Data, ACM SIGMOD, Minneapolis, Minnesota, pp. 419-429, May 1994 https://doi.org/10.1145/191839.191925
  3. Kim, S.-W., Yoon, J., Park, S., and Won, J.-I. 'Shape-based Retrieval in Time-Series Databases,' Journal of Systems and Software, Vol. 79, No. 2, pp. 191-203, Feb. 2006 https://doi.org/10.1016/j.jss.2005.05.004
  4. Wu, H., Salzberg, B., and Zhang, D., 'Online Event-driven Subsequence Matching Over Financial Data Streams,' In Proc. of Int'l Conf. on Management of Data, ACM SIGMOD, Paris, France, pp. 23-34, June 2004 https://doi.org/10.1145/1007568.1007574
  5. Moon, Y.-S., Whang, K.-Y., and Han, W.-S., 'General Match: A Subsequence Matching Method in Time-Series Databases Based on Generalized Windows,' In Proc. Int'l Conf. on Management of Data, ACM SIGMOD, Madison, Wisconsin, pp. 382-393, June 2002 https://doi.org/10.1145/564691.564735
  6. Chan, K.-P., Fu, A. W.-C., and Yu, C. T., 'Haar Wavelets for Efficient Similarity Search of Time-Series: With and Without Time Warping,' IEEE Trans. on Knowledge and Data Engineering, Vol. 15, No. 3, pp. 686-705, Jan./Feb. 2003 https://doi.org/10.1109/TKDE.2003.1198399
  7. Lim, S.-H., Park, H.-J., and Kim, S.-W., 'Using Multiple Indexes for Efficient Subsequence Matching in Time-Series Databases,' In Proc. of the 11th Int'l Conf. on Database Systems for Advanced Applications (DASFAA), Singapore, pp. 65-79, Apr. 2006
  8. Loh, W.-K, Kim, S.-W., and Whang, K.-Y., 'A Subsequence Matching Algorithm that Supports Normalization Transform in Time-Series Databases,' Data Mining and Knowledge Discovery, Vol. 9, No. 1, pp. 5-28, July 2004 https://doi.org/10.1023/B:DAMI.0000026902.89522.a3
  9. Moon, Y.-S. and Kim, J., 'A Single Index Approach for Time-Series Subsequence Matching that Supports Moving Average Transform of Arbitrary Order,' In Proc. of the 10th Pacific-Asia Conf. on Knowledge Discovery and Data Mining (PAKDD 2006), Singapore, pp. 739-749, Apr. 2006
  10. Berchtold, S., Bohm, C., and Kriegel, H.-P., 'The Pyramid-Technique: Towards Breaking the Curse of Dimensionality,' In Proc. Int'l Conf. on Management of Data, ACM SIGMOD, Seattle, Washington, pp. 142-153, June 1998 https://doi.org/10.1145/276304.276318
  11. Beckmann, N., Kriegel, H.-P., Schneider, R., and Seeger, B., 'The R*-tree: An Efficient and Robust Access Method for Points and Rectangles,' In Proc. Int'l Conf. on Management of Data, ACM SIGMOD, Atlantic City, New Jersey, pp. 322-331, May 1990 https://doi.org/10.1145/93597.98741
  12. Keogh, E. J., Chakrabarti, K., Mehrotra, S., and Pazzani, M. J., 'Locally Adaptive Dimensionality Reduction for Indexing Large Time Series Databases,' In Proc. of Int'l Conf. on Management of Data, ACM SIGMOD, Santa Barbara, California, pp. 151-162, May 2001 https://doi.org/10.1145/375663.375680
  13. Yi, B.-K., Jagadish, H. V., and Faloutsos, C., 'Efficient Retrieval of Similar Time Sequences Under Time Warping,' In Proc. the 14th Int'l Conf. on Data Engineering(ICDE), IEEE, Orlando, Florida, pp. 201-208, Feb. 1998
  14. Moon, Y.-S., Whang, K.-Y., and Loh, W.-K., 'Duality-Based Subsequence Matching in Time-Series Databases,' In Proc. the 17th Int'l Conf. on Data Engineering (ICDE), IEEE, Heidelberg, Germany, pp. 263-272, April 2001 https://doi.org/10.1109/ICDE.2001.914837
  15. Gao, L. and Wang, X. S., 'Continually Evaluating Similarity-based Pattern Queries on a Streaming Time Series,' In Proc. Int'l Conf. on Management of Data, ACM SIGMOD, Madison, Wisconsin, pp. 370-381, June 2002 https://doi.org/10.1145/564691.564734
  16. Gao, L., Yao, Z., and Wang, X. S., 'Evaluating Continuous Nearest Neighbor Queries for Streaming Time Series via Pre-fetching,' In Proc. Int'l Conf. on Information and Knowledge Management, ACM CIKM, McLean, Virginia, pp. 485-492, Nov. 2002 https://doi.org/10.1145/584792.584872
  17. Rafiei, D. and Mendelzon, A. O., 'Querying Time Series Data Based on Similarity,' IEEE Trans. on Knowledge and Data Engineering, Vol. 12, No. 5, pp. 675-693, Sept./Oct. 2000 https://doi.org/10.1109/69.877502
  18. Agrawal, R., Lin, K.-I., Sawhney, H. S., and Shim, K., 'Fast Similarity Search in the Presence of Noise, Scaling, and Translation in Time-Series Databases,' In Proc. the 21st Int'l Conf. on Very Large Data Bases, Zurich, Switzerland, pp. 490-501, Sept. 1995
  19. Chu, K. W. and Wong, M. H., 'Fast Time-Series Searching with Scaling and Shifting,' In Proc. the 15th Symposium on Principles of Database Systems, ACM PODS, Philadelphia, Pennsylvania, pp. 237-248, June 1999 https://doi.org/10.1145/303976.304000
  20. Kim, S.-W., Park, S, and Chu, W. W., 'Efficient Processing of Similarity Search Under Time Warping in Sequence Databases: An Index-based Approach,' Information Systems, Vol. 29, No. 5, pp. 405-420, July 2004 https://doi.org/10.1016/S0306-4379(03)00037-1
  21. Park, S., Chu, W. W., 'Yoon, J.. and Won, J., 'Similarity Search of Time-Warped Subsequences via a Suffix Tree,' Information Systems, Vol. 28, No. 7, pp. 867-883, Oct. 2003 https://doi.org/10.1016/S0306-4379(02)00102-3
  22. Hjaltason, G. R. and Samet, H., Incremental Similarity Search in Multimedia Databases, Dept. of Computer Science, University of Maryland, College Park, Technical Report 4199, Nov. 2000
  23. Zhao, D., Gao, W., and Chan, Y. K., 'Morphological Representation of DCT Coefficients for Image Compression,' IEEE Trans. on Circuits and Systems for Video Technology, Vol. 12, No. 9, pp. 819-823, Sept. 2002 https://doi.org/10.1109/TCSVT.2002.803218
  24. Hsieh, M. J.. Chen, M. S., and Yu, P. S., 'Integrating DCT and DWT for Approximating Cube Streams,' In Proc. of the 14th ACM Int'l Conf. on Information and Knowledge Management, Bremen, Germany, pp. 179-186, Oct. 2005 https://doi.org/10.1145/1099554.1099588
  25. Natsev, A., Rastogi, R, and Shim, K, 'WALRUS: A Similarity Retrieval Algorithm for Image Databases,' IEEE Trans. on Knowledge and Data Engineering, Vol. 16, No. 3, pp. 301-316, Mar. 2004 https://doi.org/10.1109/TKDE.2003.1262183
  26. Korn, F., Jagadish, H. V., and Faloutsos, C., 'Efficiently Supporting Ad Hoc Queries in Large Datasets of Time Sequences,' In Proc. of Int'l Conf. on Management of Data, ACM SIGMOD, Tucson, Arizona, pp. 289-300, June 1997 https://doi.org/10.1145/253260.253332
  27. Press, W. H., Flannery, B. P., Teukolsky, S. A., and Vetterling, W. T., Numerical Recipes in C: The Art of Scientific Computing, Cambridge University Press, 2nd Ed., 1992