Browse > Article

Similarity Search in Time Series Databases based on the Normalized Distance  

이상준 (서울대학교 전기컴퓨터공학부)
이석호 (서울대학교 전기컴퓨터공학부)
Abstract
In this paper, we propose a search method for time sequences which supports the normalized distance as a similarity measure. In many applications where the shape of the time sequence is a major consideration, the normalized distance is a more suitable similarity measure than the simple Lp distance. To support normalized distance queries, most of the previous work has the preprocessing step for vertical shifting which normalizes each sequence by its mean. The proposed method is motivated by the property of sequence for feature extraction. That is, the variation between two adjacent elements of a time sequence is invariant under vertical shifting. The extracted feature is indexed by the spatial access method such as R-tree. The proposed method can match time series of similar shape without vertical shifting and guarantees no false dismissals. The experiments are performed on real data(stock price movement) to verify the performance of the proposed method.
Keywords
database; time series; similarity search; normalized distance;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Eamonn J. Keogh, Kaushik Chakrabareti, Sharad Mehrotra, Michael J. Pazzani, 'Locally Adaptive Dimensionality Reduction for Indexing Large Time Series Databases,' In Proceediings of ACM SIGMOD Conference, pp. 151-162, 2001   DOI
2 Yang-Sae Moon, Kyu-Young Whang, Woong-Kee Loh, 'Duality-Based Subsequence Matching in Time-Series Databases,' In Proceedings of ICDE, pp. 263-272, 2001   DOI
3 Sze Kin Lam, Man Hon Wong, 'A Fast Projection Algorithm for Sequence Data Searching,' DKE 28(3), pp. 321-339, 1998   DOI   ScienceOn
4 Antonin Guttman, 'R-trees: A Dynamic Index Structure for Spatial Searching,' In Proceedings of ACM SIGMOD Conference, pp. 47-57, 1984
5 Norbert Beckmann, Hans-Peter Kriegel, Ralf Schneider, Bernhard Seeger, 'The R*-tree: An Efficient and Robust Access Method for Points and Rectangles,' In Proceedings of ACM SIGMOD Conference, pp. 322-331, 1990   DOI
6 Eamonn J. Keogh, Michael J. Pazzani, 'An Enhanced Representation of Time Series Which Allows Fast and Accurate Classification,Clustering and Relevance Feedback,' In Proceedings of KDD Conference, pp. 239-243, 1998
7 Sangwook Kim, Sanghyun Park and W. Chu, 'An Index-based Approach for Similarity Search Supporting Time Warping in Large Sequence Databases,' In Proceedings of ICDE, pp. 607-614, 2001   DOI
8 Byoung-Kee Yi, Christos Faloutsos, 'Fast Time Sequence Indexing for Arbitrary Lp Norms,' In Proceedings of VLDB Conference, pp. 385-394, 2000
9 Sangjun Lee, Dongseop Kwon, Sukho Lee, 'Efficient Similarity Search for Time Series Data Based on the Minimum Distanc,' In Proceedings of CAiSE, pp. 377-391, 2002
10 Eamonn J. Keogh, 'Exact Indexing of Dyanmic Time Warping,' In Proceedings of VLDB Conference, pp. 406-417, 2002
11 Flip Korn, H. V. Jagadish, Christos Faloutsos, 'Efficiently Supporting Ad Hoc Queries in Large Datasets of Time Sequences,' In Proceedings of ACM SIGMOD Conference, pp. 289-300, 1997   DOI
12 Kelvin Kam Wing Chu, Man Hon Wong, 'Fast Time-Series Searching with Scaling and Shifting,' In Proceedings of PODS, pp. 237-248, 1999   DOI
13 Byoung-Kee Yi, H. V. Jagadish, Christos Faloutsos, 'Efficient Retrieval of Similar Time Sequences Under Time Warping,' In Proceedings of ICDE, pp. 201-208, 1998   DOI
14 Rakesh Agrawal, King-Ip Lin, Harpreet S. Sawhney, Kyuseok Shim, 'Fast Similarity Search in the Presence of Noise, Scaling and Transiation in Time-Series Databases,' In Proceedings of VLDB Conference, pp. 490-501, 1995
15 Chung-Sheng Li, Philip S. Yu, Vittorio Castelli, 'Hierarchy Scan: A Hierarchical Similarity Search Algorithm for Databases of Long Sequences,' In Proceedings of ICDE, pp. 546-553, 1996   DOI
16 Davood Rafiei, Alberto O. Mendelzon, 'Similarity Based Queries for Time Series Data,' In Proceedings of ACM SIGMOD Conference, pp. 12-25, 1997   DOI
17 Chang-Shing Perng, Haixun Wang, Sylvia R. Zhang, D. Stott Parker, 'Landmarks:a New Model for Similarity-based Pattern Querying in Time Series Databases,' In Proceedings of ICDE, pp. 33-42, 2000   DOI
18 M. H. Protter, C. B. Morrey, 'A First Course in Real Analysis,' Springer-Verlag, 1997
19 Rakesh Agrawal, Christos Faloutsos, Arun N, Swami, 'Efficient Similarity Search In Sequence Databases,' In Proceedings of FODO, pp. 69-84, 1993   DOI   ScienceOn
20 Rakesh Agrawal, T. Imielinski, Arun N. Swami, 'Database Mining: A Performance Perspective,' IEEE TKDE, Special issue on Learning and Discovery in Knowledge-Based Databases 5(6), pp. 914-925, 1993   DOI   ScienceOn
21 Kelvin Kam Wing Chu, Sze Kin Lam, Man Hon Wong, 'An Efficient Hash-Based Algorithm for Sequence Data Searching,' The Computer Journal 41(6), pp. 402-415, 1998   DOI   ScienceOn
22 Usama M. Fayyad, Gregory Piatetsky-Shapiro, Padhraic Smyth, 'Knowledge Discovery and Data Mining : Towards a Unifying Framework,' In Proceedings of KDD conference, pp. 82-88, 1996
23 Eamonn J. Keogh, Michael J. Pazzani, 'A Simple Dimensionality Reduction Technique for Fast Similarity Search in Large Time Series Databases,' In Proceedings of PAKDD Conference, pp. 122-133, 2000
24 Sanghyun Park, Wesley W. Chu, Jeehee Yoon, Chihcheng Hsu, 'Efficient Searches for Similar Subsequences of Different Lengths in Sequence Databases,' In Proceedings of ICDE pp. 23-32, 2000   DOI
25 Christos Faloutsos, M. Ranganathan, Yannis. Manolopoulos, 'Fast Subsequence Matching in Time-Series Databases,' In Proceedings of ACM SIGMOD Conference, pp. 419-429, 1994   DOI
26 Davood Rafiei, 'On Similarity-Based Queries for Time Series Data,' In Proceedings of ICDE, pp. 410-417, 1999
27 Kin-pong Chan, Ada Wai-chee Fu, 'Efficient Time Series Matching by Wavelets,' In Proceedings of ICDE 1999: 126-133   DOI