Similarity Search in Time Series Databases based on the Normalized Distance

정규 거리에 기반한 시계열 데이터베이스의 유사 검색 기법

  • 이상준 (서울대학교 전기컴퓨터공학부) ;
  • 이석호 (서울대학교 전기컴퓨터공학부)
  • Published : 2004.02.01

Abstract

In this paper, we propose a search method for time sequences which supports the normalized distance as a similarity measure. In many applications where the shape of the time sequence is a major consideration, the normalized distance is a more suitable similarity measure than the simple Lp distance. To support normalized distance queries, most of the previous work has the preprocessing step for vertical shifting which normalizes each sequence by its mean. The proposed method is motivated by the property of sequence for feature extraction. That is, the variation between two adjacent elements of a time sequence is invariant under vertical shifting. The extracted feature is indexed by the spatial access method such as R-tree. The proposed method can match time series of similar shape without vertical shifting and guarantees no false dismissals. The experiments are performed on real data(stock price movement) to verify the performance of the proposed method.

본 논문에서는 정규 거리에 기반 한 유사 시퀀스의 검색 기법을 제안한다. 시퀀스의 형태가 중요한 관심 사항인 응용에서 정규 거리는 단순한 Lp 거리에 비해 적합한 유사도라 할 수 있다. 이러한 정규 거리에 기반 한 질의를 처리하기 위한 기존의 기법들은 시퀀스의 평균을 구한 후 이를 이용하여 시퀀스를 수직 이동하는 전처리 과정을 가지고 있다. 제안된 기법은 시퀀스의 인접한 두 요소들 간의 변이가 정규화 과정에 불변이라는 속성을 이용하여 수직 이동의 전처리 과정 없이 특징 벡터를 추출한 후 이를 R-tree와 같은 공간 접근 기법을 이용하여 인덱싱한다. 제안된 기법은 비슷한 형태의 시퀀스를 검색할 수 있으며 착오 누락이 얼음을 보장한다. 실제 주식 데이타를 이용한 실험을 통해 제안된 기법의 성능을 확인하였다.

Keywords

References

  1. Rakesh Agrawal, Christos Faloutsos, Arun N, Swami, 'Efficient Similarity Search In Sequence Databases,' In Proceedings of FODO, pp. 69-84, 1993 https://doi.org/10.1007/3-540-57301-1_5
  2. Rakesh Agrawal, T. Imielinski, Arun N. Swami, 'Database Mining: A Performance Perspective,' IEEE TKDE, Special issue on Learning and Discovery in Knowledge-Based Databases 5(6), pp. 914-925, 1993 https://doi.org/10.1109/69.250074
  3. Usama M. Fayyad, Gregory Piatetsky-Shapiro, Padhraic Smyth, 'Knowledge Discovery and Data Mining : Towards a Unifying Framework,' In Proceedings of KDD conference, pp. 82-88, 1996
  4. Davood Rafiei, Alberto O. Mendelzon, 'Similarity Based Queries for Time Series Data,' In Proceedings of ACM SIGMOD Conference, pp. 12-25, 1997 https://doi.org/10.1145/253260.253264
  5. Kelvin Kam Wing Chu, Sze Kin Lam, Man Hon Wong, 'An Efficient Hash-Based Algorithm for Sequence Data Searching,' The Computer Journal 41(6), pp. 402-415, 1998 https://doi.org/10.1093/comjnl/41.6.402
  6. Davood Rafiei, 'On Similarity-Based Queries for Time Series Data,' In Proceedings of ICDE, pp. 410-417, 1999
  7. Sanghyun Park, Wesley W. Chu, Jeehee Yoon, Chihcheng Hsu, 'Efficient Searches for Similar Subsequences of Different Lengths in Sequence Databases,' In Proceedings of ICDE pp. 23-32, 2000 https://doi.org/10.1109/ICDE.2000.839384
  8. Christos Faloutsos, M. Ranganathan, Yannis. Manolopoulos, 'Fast Subsequence Matching in Time-Series Databases,' In Proceedings of ACM SIGMOD Conference, pp. 419-429, 1994 https://doi.org/10.1145/191839.191925
  9. Kin-pong Chan, Ada Wai-chee Fu, 'Efficient Time Series Matching by Wavelets,' In Proceedings of ICDE 1999: 126-133 https://doi.org/10.1109/ICDE.1999.754915
  10. Eamonn J. Keogh, Michael J. Pazzani, 'A Simple Dimensionality Reduction Technique for Fast Similarity Search in Large Time Series Databases,' In Proceedings of PAKDD Conference, pp. 122-133, 2000
  11. Eamonn J. Keogh, Kaushik Chakrabareti, Sharad Mehrotra, Michael J. Pazzani, 'Locally Adaptive Dimensionality Reduction for Indexing Large Time Series Databases,' In Proceediings of ACM SIGMOD Conference, pp. 151-162, 2001 https://doi.org/10.1145/375663.375680
  12. Yang-Sae Moon, Kyu-Young Whang, Woong-Kee Loh, 'Duality-Based Subsequence Matching in Time-Series Databases,' In Proceedings of ICDE, pp. 263-272, 2001 https://doi.org/10.1109/ICDE.2001.914837
  13. Sze Kin Lam, Man Hon Wong, 'A Fast Projection Algorithm for Sequence Data Searching,' DKE 28(3), pp. 321-339, 1998 https://doi.org/10.1016/S0169-023X(98)00023-8
  14. Antonin Guttman, 'R-trees: A Dynamic Index Structure for Spatial Searching,' In Proceedings of ACM SIGMOD Conference, pp. 47-57, 1984
  15. Norbert Beckmann, Hans-Peter Kriegel, Ralf Schneider, Bernhard Seeger, 'The R*-tree: An Efficient and Robust Access Method for Points and Rectangles,' In Proceedings of ACM SIGMOD Conference, pp. 322-331, 1990 https://doi.org/10.1145/93597.98741
  16. Byoung-Kee Yi, Christos Faloutsos, 'Fast Time Sequence Indexing for Arbitrary Lp Norms,' In Proceedings of VLDB Conference, pp. 385-394, 2000
  17. Sangwook Kim, Sanghyun Park and W. Chu, 'An Index-based Approach for Similarity Search Supporting Time Warping in Large Sequence Databases,' In Proceedings of ICDE, pp. 607-614, 2001 https://doi.org/10.1109/ICDE.2001.914875
  18. Sangjun Lee, Dongseop Kwon, Sukho Lee, 'Efficient Similarity Search for Time Series Data Based on the Minimum Distanc,' In Proceedings of CAiSE, pp. 377-391, 2002
  19. Eamonn J. Keogh, 'Exact Indexing of Dyanmic Time Warping,' In Proceedings of VLDB Conference, pp. 406-417, 2002
  20. Eamonn J. Keogh, Michael J. Pazzani, 'An Enhanced Representation of Time Series Which Allows Fast and Accurate Classification,Clustering and Relevance Feedback,' In Proceedings of KDD Conference, pp. 239-243, 1998
  21. Flip Korn, H. V. Jagadish, Christos Faloutsos, 'Efficiently Supporting Ad Hoc Queries in Large Datasets of Time Sequences,' In Proceedings of ACM SIGMOD Conference, pp. 289-300, 1997 https://doi.org/10.1145/253260.253332
  22. Byoung-Kee Yi, H. V. Jagadish, Christos Faloutsos, 'Efficient Retrieval of Similar Time Sequences Under Time Warping,' In Proceedings of ICDE, pp. 201-208, 1998 https://doi.org/10.1109/ICDE.1998.655778
  23. Rakesh Agrawal, King-Ip Lin, Harpreet S. Sawhney, Kyuseok Shim, 'Fast Similarity Search in the Presence of Noise, Scaling and Transiation in Time-Series Databases,' In Proceedings of VLDB Conference, pp. 490-501, 1995
  24. Chung-Sheng Li, Philip S. Yu, Vittorio Castelli, 'Hierarchy Scan: A Hierarchical Similarity Search Algorithm for Databases of Long Sequences,' In Proceedings of ICDE, pp. 546-553, 1996 https://doi.org/10.1109/ICDE.1996.492205
  25. Kelvin Kam Wing Chu, Man Hon Wong, 'Fast Time-Series Searching with Scaling and Shifting,' In Proceedings of PODS, pp. 237-248, 1999 https://doi.org/10.1145/303976.304000
  26. Chang-Shing Perng, Haixun Wang, Sylvia R. Zhang, D. Stott Parker, 'Landmarks:a New Model for Similarity-based Pattern Querying in Time Series Databases,' In Proceedings of ICDE, pp. 33-42, 2000 https://doi.org/10.1109/ICDE.2000.839385
  27. M. H. Protter, C. B. Morrey, 'A First Course in Real Analysis,' Springer-Verlag, 1997