• Title/Summary/Keyword: Subsequence Matching

Search Result 44, Processing Time 0.028 seconds

Effectiveness Evaluations of Subsequence Matching Methods Using KOSPI Data (한국 주식 데이터를 이용한 서브시퀀스 매칭 방법의 효과성 평가)

  • Yoo Seung Keun;Lee Sang Ho
    • The KIPS Transactions:PartD
    • /
    • v.12D no.3 s.99
    • /
    • pp.355-364
    • /
    • 2005
  • Previous researches on subsequence matching have been focused on how to make indexes in order to speed up the matching time, and do not take into account the effectiveness issues of subsequence matching methods. This paper considers the effectiveness of subsequence matching methods and proposes two metrics for effectiveness evaluations of subsequence matching algorithms. We have applied the proposed metrics to Korean stock data and five known matching algorithms. The analysis on the empirical data shows that two methods (i.e., the method supporting normalization, and the method supporting scaling and shifting) outperform the others in terms of the effectiveness of subsequence matching.

Linear Detrending Subsequence Matching in Time-Series Databases (시계열 데이터베이스에서 선형 추세 제거 서브시퀀스 매칭)

  • Gil, Myeong-Seon;Kim, Bum-Soo;Moon, Yang-Sae;Kim, Jin-Ho
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.16 no.5
    • /
    • pp.586-590
    • /
    • 2010
  • In this paper we formally define the linear detrending subsequence matching and propose its efficient index-based solution. To this end, we first present the notion of LD-windows. We eliminate the linear trend from a subsequence rather than each window itself and obtain LD-windows by dividing the subsequence into windows. Using the LD-windows we present a lower bounding theorem of the index-based solution and formally prove its correctness. Based on this lower bounding theorem, we then propose the index building and subsequence matching algorithms, respectively. Finally, we show the superiority of our index- based solution through experiments.

A Subsequence Matching Technique that Supports Time Warping Efficiently (타임 워핑을 지원하는 효율적인 서브시퀀스 매칭 기법)

  • Park, Sang-Hyun;Kim, Sang-Wook;Cho, June-Suh;Lee, Hoen-Gil
    • Journal of Industrial Technology
    • /
    • v.21 no.A
    • /
    • pp.167-179
    • /
    • 2001
  • This paper discusses an index-based subsequence matching that supports time warping in large sequence databases. Time warping enables finding sequences with similar patterns even when they are of different lengths. In earlier work, we suggested an efficient method for whole matching under time warping. This method constructs a multidimensional index on a set of feature vectors, which are invariant to time warping, from data sequences. For filtering at feature space, it also applies a lower-bound function, which consistently underestimates the time warping distance as well as satisfies the triangular inequality. In this paper, we incorporate the prefix-querying approach based on sliding windows into the earlier approach. For indexing, we extract a feature vector from every subsequence inside a sliding window and construct a multi-dimensional index using a feature vector as indexing attributes. For query precessing, we perform a series of index searches using the feature vectors of qualifying query prefixes. Our approach provides effective and scalable subsequence matching even with a large volume of a database. We also prove that our approach does not incur false dismissal. To verily the superiority of our method, we perform extensive experiments. The results reseal that our method achieves significant speedup with real-world S&P 500 stock data and with very large synthetic data.

  • PDF

On Extending the Prefix-Querying Method for Efficient Time-Series Subsequence Matching Under Time Warping (타임 워핑 하의 효율적인 시계열 서브시퀀스 매칭을 위한 접두어 질의 기법의 확장)

  • Chang Byoung-Chol;Kim Sang-Wook;Cha Jae-Hyuk
    • The KIPS Transactions:PartD
    • /
    • v.13D no.3 s.106
    • /
    • pp.357-368
    • /
    • 2006
  • This paper discusses the way of processing time-series subsequence matching under time warping. Time warping enables finding sequences with similar patterns even when they are of different lengths. The prefix-querying method is the first index-based approach that performs time-series subsequence matching under time warping without false dismissals. This method employs the $L_{\infty}$ as a base distance function for allowing users to issue queries conveniently. In this paper, we extend the prefix-querying method for absorbing $L_1$, which is the most-widely used as a base distance function in time-series subsequence matching under time warping, instead of $L_{\infty}$. We also formally prove that the proposed method does not incur any false dismissals in the subsequence matching. To show the superiority of our method, we conduct performance evaluation via a variety of experiments. The results reveal that our method achieves significant performance improvement in orders of magnitude compared with previous methods.

Performance Evaluation of Methods for Time-Series Subsequence Matching Under Time Warping (타임 워핑 하의 시계열 서브시퀀스 매칭 기법의 성능 평가)

  • 김만순;김상욱
    • Proceedings of the Korea Contents Association Conference
    • /
    • 2003.11a
    • /
    • pp.290-297
    • /
    • 2003
  • A time-series database is a set of data sequences, each of which is a list of changing values corresponding to an object. Subsequence matching under time warping is defined as an operation that finds such subsequences whose time warping distance to a given query sequence are below a tolerance from a time-series database. In this paper, we first point out the characteristics of the previous methods for time-series sequence matching under time warping, and then discuss the approaches for applying them to whole matching as well as subsequence matching. Also, we perform quantitative performance evaluation via a series of experiments with real-life data. There have not been such researches in the literature that compare the performances of all the previous methods of subsequence matching under time warping. Thus, our results would be used as a good reference for showing their relative performances.

  • PDF

A Single Index Approach for Subsequence Matching that Supports Normalization Transform in Time-Series Databases (시계열 데이터베이스에서 단일 색인을 사용한 정규화 변환 지원 서브시퀀스 매칭)

  • Moon Yang-Sae;Kim Jin-Ho;Loh Woong-Kee
    • The KIPS Transactions:PartD
    • /
    • v.13D no.4 s.107
    • /
    • pp.513-524
    • /
    • 2006
  • Normalization transform is very useful for finding the overall trend of the time-series data since it enables finding sequences with similar fluctuation patterns. The previous subsequence matching method with normalization transform, however, would incur index overhead both in storage space and in update maintenance since it should build multiple indexes for supporting arbitrary length of query sequences. To solve this problem, we propose a single index approach for the normalization transformed subsequence matching that supports arbitrary length of query sequences. For the single index approach, we first provide the notion of inclusion-normalization transform by generalizing the original definition of normalization transform. The inclusion-normalization transform normalizes a window by using the mean and the standard deviation of a subsequence that includes the window. Next, we formally prove correctness of the proposed method that uses the inclusion-normalization transform for the normalization transformed subsequence matching. We then propose subsequence matching and index building algorithms to implement the proposed method. Experimental results for real stock data show that our method improves performance by up to $2.5{\sim}2.8$ times over the previous method. Our approach has an additional advantage of being generalized to support many sorts of other transforms as well as normalization transform. Therefore, we believe our work will be widely used in many sorts of transform-based subsequence matching methods.

Efficient Time-Series Subsequence Matching Using MBR-Safe Property of Piecewise Aggregation Approximation (부분 집계 근사법의 MBR-안전 성질을 이용한 효율적인 시계열 서브시퀀스 매칭)

  • Moon, Yang-Sae
    • Journal of KIISE:Databases
    • /
    • v.34 no.6
    • /
    • pp.503-517
    • /
    • 2007
  • In this paper we address the MBR-safe property of Piecewise Aggregation Approximation(PAA), and propose an of efficient subsequence matching method based on the MBR-safe PAA. A transformation is said to be MBR-safe if a low-dimensional MBR to which a high- dimensional MBR is transformed by the transformation contains every individual low-dimensional sequence to which a high-dimensional sequence is transformed. Using an MBR-safe transformation we can reduce the number of lower-dimensional transformations required in similar sequence matching, since it transforms a high-dimensional MBR itself to a low-dimensional MBR directly. Furthermore, PAA is known as an excellent lower-dimensional transformation single its computation is very simple, and its performance is superior to other transformations. Thus, to integrate these advantages of PAA and MBR-safeness, we first formally confirm the MBR-safe property of PAA, and then improve subsequence matching performance using the MBR-safe PAA. Contributions of the paper can be summarized as follows. First, we propose a PAA-based MBR-safe transformation, called mbrPAA, and formally prove the MBR-safeness of mbrPAA. Second, we propose an mbrPAA-based subsequence matching method, and formally prove its correctness of the proposed method. Third, we present the notion of entry reuse property, and by using the property, we propose an efficient method of constructing high-dimensional MBRs in subsequence matching. Fourth, we show the superiority of mbrPAA through extensive experiments. Experimental results show that, compared with the previous approach, our mbrPAA is 24.2 times faster in the low-dimensional MBR construction and improves subsequence matching performance by up to 65.9%.

An Optimal Way to Index Searching of Duality-Based Time-Series Subsequence Matching (이원성 기반 시계열 서브시퀀스 매칭의 인덱스 검색을 위한 최적의 기법)

  • Kim, Sang-Wook;Park, Dae-Hyun;Lee, Heon-Gil
    • The KIPS Transactions:PartD
    • /
    • v.11D no.5
    • /
    • pp.1003-1010
    • /
    • 2004
  • In this paper, we address efficient processing of subsequence matching in time-series databases. We first point out the performance problems occurring in the index searching of a prior method for subsequence matching. Then, we propose a new method that resolves these problems. Our method starts with viewing the index searching of subsequence matching from a new angle, thereby regarding it as a kind of a spatial-join called a window-join. For speeding up the window-join, our method builds an R*-tree in main memory for f query sequence at starting of sub-sequence matching. Our method also includes a novel algorithm for joining effectively one R*-tree in disk, which is for data sequences, and another R*-tree in main memory, which is for a query sequence. This algorithm accesses each R*-tree page built on data sequences exactly cure without incurring any index-level false alarms. Therefore, in terms of the number of disk accesses, the proposed algorithm proves to be optimal. Also, performance evaluation through extensive experiments shows the superiority of our method quantitatively.

Optimal Construction of Multiple Indexes for Time-Series Subsequence Matching (시계열 서브시퀀스 매칭을 위한 최적의 다중 인덱스 구성 방안)

  • Lim, Seung-Hwan;Kim, Sang-Wook;Park, Hee-Jin
    • Journal of KIISE:Databases
    • /
    • v.33 no.2
    • /
    • pp.201-213
    • /
    • 2006
  • A time-series database is a set of time-series data sequences, each of which is a list of changing values of the object in a given period of time. Subsequence matching is an operation that searches for such data subsequences whose changing patterns are similar to a query sequence from a time-series database. This paper addresses a performance issue of time-series subsequence matching. First, we quantitatively examine the performance degradation caused by the window size effect, and then show that the performance of subsequence matching with a single index is not satisfactory in real applications. We argue that index interpolation is fairly useful to resolve this problem. The index interpolation performs subsequence matching by selecting the most appropriate one from multiple indexes built on windows of their inherent sizes. For index interpolation, we first decide the sites of windows for multiple indexes to be built. In this paper, we solve the problem of selecting optimal window sizes in the perspective of physical database design. For this, given a set of query sequences to be peformed in a target time-series database and a set of window sizes for building multiple indexes, we devise a formula that estimates the cost of all the subsequence matchings. Based on this formula, we propose an algorithm that determines the optimal window sizes for maximizing the performance of entire subsequence matchings. We formally Prove the optimality as well as the effectiveness of the algorithm. Finally, we perform a series of extensive experiments with a real-life stock data set and a large volume of a synthetic data set. The results reveal that the proposed approach improves the previous one by 1.5 to 7.8 times.

Range Subsequence Matching under Dynamic Time Warping (DTW 거리를 지원하는 범위 서브시퀀스 매칭)

  • Han, Wook-Shin;Lee, Jin-Soo;Moon, Yang-Sae
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.14 no.6
    • /
    • pp.559-566
    • /
    • 2008
  • In this paper, we propose a range subsequence matching under dynamic time warping (DTW) distance. We exploit Dual Match, which divides data sequences into disjoint windows and the query sequence into sliding windows. However, Dual Match is known to work under Euclidean distance. We argue that Euclidean distance is a fragile distance, and thus, DTW should be supported by Dual Match. For this purpose, we derive a new important theorem showing the correctness of our approach and provide a detailed algorithm using the theorem. Extensive experimental results show that our range subsequence matching performs much better than the sequential scan algorithm.