Browse > Article
http://dx.doi.org/10.3745/KIPSTD.2008.15-D.1.31

Hybrid Lower-Dimensional Transformation for Similar Sequence Matching  

Moon, Yang-Sae (강원대학교 IT특성화대학 컴퓨터과학)
Kim, Jin-Ho (강원대학교 IT특성화대학 컴퓨터과학)
Abstract
We generally use lower-dimensional transformations to convert high-dimensional sequences into low-dimensional points in similar sequence matching. These traditional transformations, however, show different characteristics in indexing performance by the type of time-series data. It means that the selection of lower-dimensional transformations makes a significant influence on the indexing performance in similar sequence matching. To solve this problem, in this paper we propose a hybrid approach that integrates multiple transformations and uses them in a single multidimensional index. We first propose a new notion of hybrid lower-dimensional transformation that exploits different lower-dimensional transformations for a sequence. We next define the hybrid distance to compute the distance between the transformed sequences. We then formally prove that the hybrid approach performs the similar sequence matching correctly. We also present the index building and the similar sequence matching algorithms that use the hybrid approach. Experimental results for various time-series data sets show that our hybrid approach outperforms the single transformation-based approach. These results indicate that the hybrid approach can be widely used for various time-series data with different characteristics.
Keywords
Data Mining; Time-Series Databases; Hybrid Lower-Dimensional Transformation; Similar Sequence Matching;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Moon, Y.-S., Whang, K.-Y., and Han, W.-S., 'General Match: A Subsequence Matching Method in Time-Series Databases Based on Generalized Windows,' In Proc. Int'l Conf. on Management of Data, ACM SIGMOD, Madison, Wisconsin, pp.382-393, June, 2002
2 Agrawal, R., Faloutsos, C., and Swami, A., 'Efficient Similarity Search in Sequence Databases,' In Proc. the 4th Int'l Conf. on Foundations of Data Organization and Algorithms, Chicago, Illinois, pp.69-84, Oct., 1993
3 Faloutsos, C., Ranganathan, M., and Manolopoulos, Y., 'Fast Subsequence Matching in Time-Series Databases,' In Proc. Int'l Conf. on Management of Data, ACM SIGMOD, Minneapolis, Minnesota, pp.419-429, May, 1994
4 Moon, Y.-S., Whang, K.-Y., and Loh, W.-K., 'Duality-Based Subsequence Matching in Time-Series Databases,' In Proc. the 17th Int'l Conf. on Data Engineering(ICDE), IEEE, Heidelberg, Germany, pp.263-272, April, 2001
5 Keogh, E. J. et al., 'LB_Keogh Supports Exact Indexing of Shapes under Rotation Invariance with Arbitrary Representations and Distance Measures,' In Proc. Int'l Conf. on Very Large Data Bases (VLDB), Seoul, Korea, pp.882-893, Sept., 2006
6 Chan, K.-P., Fu, A. W.-C., and Yu, C. T., 'Haar Wavelets for Efficient Similarity Search of Time-Series: With and Without Time Warping,' IEEE Trans. on Knowledge and Data Engineering, Vol.15, No.3, pp.686-705, Jan./Feb., 2003   DOI   ScienceOn
7 Loh, W.-K., Kim, S.-W., and Whang, K.-Y., 'A Subsequence Matching Algorithm that Supports Normalization Transform in Time-Series Databases,' Data Mining and Knowledge Discovery, Vol.9, No.1, pp.5-28, July, 2004   DOI
8 Keogh, J., Chakrabarti, K., Mehrotra, S., and Pazzani, M. J., 'Locally Adaptive Dimensionality Reduction for Indexing Large Time Series Databases,' In Proc. of Int'l Conf. on Management of Data, ACM SIGMOD, Santa Barbara, CA, pp.151-162, May, 2001
9 Keogh, J., Chakrabarti, K., Pazzani, M. J., and Mehrotra, S., 'Dimensionality Reduction for Fast Similarity Search in Large Time Series Databases,' Knowledge and Information Systems, Vol.3, No.3, pp.263-286, Aug., 2001   DOI   ScienceOn
10 Keogh, E. J., Chu, S., and Pazzani, M. J., 'Ensemble-Index: A New Approach to Indexing Large Databases,' In Proc. of the 7th Int'l Conf. on Knowledge Discovery and Data Mining, ACM SIGKDD, San Francisco, CA, pp.117-125, Aug., 2001
11 Berchtold, S., Bohm, C., and Kriegel, H.-P., 'The Pyramid- Technique: Towards Breaking the Curse of Dimensionality,' In Proc. Int'l Conf. on Management of Data, ACM SIGMOD, Seattle, Washington, pp.142-153, June, 1998
12 Beckmann, N., Kriegel, H.-P., Schneider, R., and Seeger, B., 'The R*-tree: An Efficient and Robust Access Method for Points and Rectangles,' In Proc. Int'l Conf. on Management of Data, ACM SIGMOD, Atlantic City, New Jersey, pp.322-331, May, 1990   DOI
13 Lim, S.-H., Park, H.-J., and Kim, S.-W., 'Using Multiple Indexes for Efficient Subsequence Matching in Time-Series Databases,' In Proc. of the 11th Int'l Conf. on Database Systems for Advanced Applications (DASFAA 2006), Singapore, pp.65-79, Apr., 2006
14 Moon, Y.-S. and Kim, J., 'A Single Index Approach for Time-Series Subsequence Matching that Supports Moving Average Transform of Arbitrary Order,' In Proc. of the 10th Pacific-Asia Conf. on Knowledge Discovery and Data Mining (PAKDD 2006), Singapore, pp.739-749, Apr., 2006
15 Moon, Y.-S., 'An MBR-Safe Transform for High-Dimensional MBRs in Similar Sequence Matching,' In Proc. of the 12th Int'l Conf. on Database Systems for Advanced Applications (DASFAA 2007), Bangkok, Thailand, pp.79-90, April, 2007
16 Yi, B.-K. and Faloutsos, C., 'Fast Time Sequence Indexing for Arbitrary Lp Norms,' In Proc. of the 26th Int'l Conf. on Very Large Data Bases, Cairo, Egypt, pp.385-394, Sept., 2000
17 Gao, L. and Wang, X. S., 'Continually Evaluating Similaritybased Pattern Queries on a Streaming Time Series,' In Proc. Int'l Conf. on Management of Data, ACM SIGMOD, Madison, Wisconsin, pp.370-381, June, 2002
18 Hsieh, M. J., Chen, M. S., and Yu, P. S., 'Integrating DCT and DWT for Approximating Cube Streams,' In Proc. of the 14th ACM Int'l Conf. on Information and Knowledge Management, Bremen, Germany, pp.179-186, Oct., 2005