시계열 데이타베이스에서 유사한 서브시퀀스의 모양 기반 검색

Shape-Based Retrieval of Similar Subsequences in Time-Series Databases

  • 윤지희 (한림대학교 정보통신공학부) ;
  • 김상욱 (강원대학교 컴퓨터정보통신공학부) ;
  • 김태훈 ((주)클래러스) ;
  • 박상현 (포항공과대학교 컴퓨터공학과)
  • 발행 : 2002.10.01

초록

본 논문에서는 시계열 데이타베이스에서의 모양 기반 검색 문제에 관하여 논의한다. 모양 기반 검색은 실제 요소 값과 관계없이 질의 시퀀스와 유사한 모양을 갖는 (서브)시퀀스를 찾는 연산이다. 본 연구에서는 모양 기반 서브시퀀스 검색을 위한 새로운 기법을 제안한다. 먼저, 시프팅, 스케일링, 이동 평균, 타임 워핑 등 변환들의 다양한 조합을 지원하는 모양 기반 검색을 위하여 새로운 유사 모델을 제시한다. 또한, 이러한 유사 모델을 기반으로 하는 모양 기반 검색을 효과적으로 처리하기 위하여 효율적인 인덱싱 및 질의 처리 기법들을 제안한다. 제안된 기법의 유용성을 규명하기 위하여 실제 데이타인 S&P 500 주식 데이터를 이용한 다양한 실험을 수행한다. 실험 결과에 의하면, 제안된 기법은 질의 시퀀스의 모양과 유사한 모양을 갖는 서브시퀀스들을 성공적으로 검색할 뿐만 아니라 순차 검색 기법과 비교하여 66배까지의 상당한 성능 개선 효과를 갖는 것으로 나타났다.

This paper deals with the problem of shape-based retrieval in time-series databases. The shape-based retrieval is defined as the operation that searches for the (sub)sequences whose shapes are similar to that of a given query sequence regardless of their actual element values. In this paper, we propose an effective and efficient approach for shape-based retrieval of subsequences. We first introduce a new similarity model for shape-based retrieval that supports various combinations of transformations such as shifting, scaling, moving average, and time warping. For efficient processing of the shape-based retrieval based on the similarity model, we also propose the indexing and query processing methods. To verify the superiority of our approach, we perform extensive experiments with the real-world S&P 500 stock data. The results reveal that our approach successfully finds all the subsequences that have the shapes similar to that of the query sequence, and also achieves significant speedup up to around 66 times compared with the sequential scan method.

키워드

참고문헌

  1. R. Agrawal, C. Faloutsos, and A. Swami, 'Efficient Similarity Search in Sequence Databases,' In Proc. Int'l. Conference on Foundations of Data Organization and Algorithms, FODO, pp. 69-84, Oct. 1993
  2. R. Agrawal et al., 'Fast Similarity Search in the Presence of Noise, Scaling, and Translation in Time-Series Databases,' In Proc. lnt'l. Conference on Very Large Data Bases, VLDB, pp. 490-501, Sept. 1995
  3. C. Faloutsos, M. Ranganathan, and Y. Manolopoulos, 'Fast Subsequence Matching in Time-Series Databases,' In Proc. Int'l. Conf. on Management of Data, ACM SIGMOD, pp. 419-429, May 1994 https://doi.org/10.1145/191839.191925
  4. D. Rafiei and A. Mendelzon, 'Similarity-Based Queries for Time-Series Data,' In Proc. Int'l. Conf. on Management of Data, ACM SIGMOD, pp. 13-24, 1997 https://doi.org/10.1145/253260.253264
  5. K. K. W. Chu and M. H. Wong, 'Fast Time-Series Searching with Scaling and Shifting,' In Proc. Int'l. Symp. on Principles of Database Systems, ACM PODS, pp. 237-248, May 1999 https://doi.org/10.1145/303976.304000
  6. D. Q. Goldin and P. C. Kanellakis, 'On Similarity Queries for Time-Series Data: Constraint Specification and Implementation,' In Proc. Int'l. Conf. on Principles and Practice of Constraint Programming, CP, pp. 137-153, Sept. 1995 https://doi.org/10.1007/3-540-60299-2_9
  7. G. Das, D. Gunopulos, and H. Mannila, 'Finding Similar Time Series,' In Proc. European Symp. on Principles of Data Mining and Knowledge Discovery, PKDD, pp. 88-100, 1997 https://doi.org/10.1007/3-540-63223-9_109
  8. W. K. Loh, S. W. Kim, and K. Y. Whang, 'Index Interpolation: An Approach for Subsequence Matching Supporting Normalization Transform in Time-Series Databases,' In Proc. Intl. Conf. on Information and Knowledge Management, ACM CIKM,2000 https://doi.org/10.1145/354756.354856
  9. D. J. Berndt and J. Clifford, 'Finding Patterns in Time Series: A Dynamic Programming Approach,' Advances in Knowledge Discovery and Data Mining, pp. 229-248, 1996
  10. B. K. Yi, H. V. Jagadish, and C. Faloutsos, 'Efficient Retrieval of Similar Time Sequences Under Time Warping,' In Proc. Int'l. Conf. on Data Engineering, IEEE, pp. 201-208, 1998 https://doi.org/10.1109/ICDE.1998.655778
  11. S. H. Park, W. W. Chu, J. H. Yoon, and C. Hsu, 'Efficient Searches for Similar Subsequences of Difference Lengths in Sequence Databases,' In Proc. Int'l. Conf. on Data Engineering, IEEE, pp. 23-32. 2000 https://doi.org/10.1109/ICDE.2000.839384
  12. R. Agrawal et al., 'Querying Shapes of Histories,' In Proc. Int'l. Conference on Very Large Data Bases, VLDB, pp. 502-514, Sept. 1995
  13. C. S. Perng et al., 'Landmarks: A New Model for Similarity- Based Pattern Querying in Time Series Databases,' In Proc. Int'l. Conf. on Data Engineering, IEEE, pp. 33-42, 2000 https://doi.org/10.1109/ICDE.2000.839385
  14. M. Kendall, Time-Series, 2nd Edition, Charles Griffin and Company, 1979
  15. C. Chatfield, The Analysis of Time-Series: An Introduction, 3rd Edition, Chapman and Hall, 1984
  16. K. S. Shim, R. Srikant, and R. Agrawal, 'High-dimensional Similarity Joins,' In Proc. Int'l. Conf. on Data Engineering, IEEE, pp. 301-311, Apr. 1997 https://doi.org/10.1109/ICDE.1997.581814
  17. S. W. Kim, S. H. Park, and W. W. Chu, 'An Index-Based Approach for Similarity Search Supporting Time Warping in Large Sequence Databases,' In Proc. Intl. Conf. on Data Engineering, IEEE, pp. 607-614, 2001 https://doi.org/10.1109/ICDE.2001.914875
  18. L. Rabiner and H. H. Juang, Fundamentals of Speech Recognition, Prentice Hall, 1993
  19. G. A. Stephen, String Searching Algorithms, World Scientific Publishing, 1994
  20. S. H. Park, W. W. Chu, J. H. Yoon, and C. Hsu, A Suffix Tree for Fast Similarity Searches of Time-warped Subsequences in Sequence Databases, UCLA-CS-TR-990005, 1999