• Title/Summary/Keyword: Time-series databases

Search Result 86, Processing Time 0.025 seconds

Data Processing Method for Real-time Safety Supervision System in Railway (실시간 철도안전 관제를 위한 데이터 처리 방안 연구)

  • Shin, Kwang-Ho;Jung, Hye-Ran;Ahn, Jin
    • Journal of the Korean Society for Railway
    • /
    • v.19 no.4
    • /
    • pp.445-455
    • /
    • 2016
  • A goal of the Real-time railway safety supervision system is to improve the safety oversight efficiency and to prevent accidents by integrating existing distributed monitoring systems, train, signal, power and facilities. So, the system require better performance regarding real-time processing based on big data. The disk-based database that is used in existing railway control systems has a problem with real-time processing; memory-based databases haves a limitation in terms of big-data processing; and time series databases haves a limitation in terms of real-time processing. So, we need a new database architecture for simultaneous real-time processing based on big data. In this study, we review the existing railway monitoring systems and propose a new database architecture for a real-time railway safety supervision system.

An Efficient Algorithm for Streaming Time-Series Matching that Supports Normalization Transform (정규화 변환을 지원하는 스트리밍 시계열 매칭 알고리즘)

  • Loh, Woong-Kee;Moon, Yang-Sae;Kim, Young-Kuk
    • Journal of KIISE:Databases
    • /
    • v.33 no.6
    • /
    • pp.600-619
    • /
    • 2006
  • According to recent technical advances on sensors and mobile devices, processing of data streams generated by the devices is becoming an important research issue. The data stream of real values obtained at continuous time points is called streaming time-series. Due to the unique features of streaming time-series that are different from those of traditional time-series, similarity matching problem on the streaming time-series should be solved in a new way. In this paper, we propose an efficient algorithm for streaming time- series matching problem that supports normalization transform. While the existing algorithms compare streaming time-series without any transform, the algorithm proposed in the paper compares them after they are normalization-transformed. The normalization transform is useful for finding time-series that have similar fluctuation trends even though they consist of distant element values. The major contributions of this paper are as follows. (1) By using a theorem presented in the context of subsequence matching that supports normalization transform[4], we propose a simple algorithm for solving the problem. (2) For improving search performance, we extend the simple algorithm to use $k\;({\geq}\;1)$ indexes. (3) For a given k, for achieving optimal search performance of the extended algorithm, we present an approximation method for choosing k window sizes to construct k indexes. (4) Based on the notion of continuity[8] on streaming time-series, we further extend our algorithm so that it can simultaneously obtain the search results for $m\;({\geq}\;1)$ time points from present $t_0$ to a time point $(t_0+m-1)$ in the near future by retrieving the index only once. (5) Through a series of experiments, we compare search performances of the algorithms proposed in this paper, and show their performance trends according to k and m values. To the best of our knowledge, since there has been no algorithm that solves the same problem presented in this paper, we compare search performances of our algorithms with the sequential scan algorithm. The experiment result showed that our algorithms outperformed the sequential scan algorithm by up to 13.2 times. The performances of our algorithms should be more improved, as k is increased.

Subsequence Matching Under Time Warping in Time-Series Databases : Observation, Optimization, and Performance Results (시계열 데이터베이스에서 타임 워핑 하의 서브시퀀스 매칭 : 관찰, 최적화, 성능 결과)

  • Kim Man-Soon;Kim Sang-Wook
    • The KIPS Transactions:PartD
    • /
    • v.11D no.7 s.96
    • /
    • pp.1385-1398
    • /
    • 2004
  • This paper discusses an effective processing of subsequence matching under time warping in time-series databases. Time warping is a trans-formation that enables finding of sequences with similar patterns even when they are of different lengths. Through a preliminary experiment, we first point out that the performance bottleneck of Naive-Scan, a basic method for processing of subsequence matching under time warping, is on the CPU processing step. Then, we propose a novel method that optimizes the CPU processing step of Naive-Scan. The proposed method maximizes the CPU performance by eliminating all the redundant calculations occurring in computing the time warping distance between the query sequence and data subsequences. We formally prove the proposed method does not incur false dismissals and also is the optimal one for processing Naive-Scan. Also, we discuss the we discuss to apply the proposed method to the post-processing step of LB-Scan and ST-Filter, the previous methods for processing of subsequence matching under time warping. Then, we quantitatively verify the performance improvement ef-fects obtained by the proposed method via extensive experiments. The result shows that the performance of all the three previous methods im-proves by employing the proposed method. Especially, Naive-Scan, which is known to show the worst performance, performs much better than LB-Scan as well as ST-Filter in all cases when it employs the proposed method for CPU processing. This result is so meaningful in that the performance inversion among Nive- Scan, LB-Scan, and ST-Filter has occurred by optimizing the CPU processing step, which is their perform-ance bottleneck.

Optimization of Post-Processing for Subsequence Matching in Time-Series Databases (시계열 데이터베이스에서 서브시퀀스 매칭을 위한 후처리 과정의 최적화)

  • Kim, Sang-Uk
    • The KIPS Transactions:PartD
    • /
    • v.9D no.4
    • /
    • pp.555-560
    • /
    • 2002
  • Subsequence matching, which consists of index searching and post-processing steps, is an operation that finds those subsequences whose changing patterns are similar to that of a given query sequence from a time-series database. This paper discusses optimization of post-processing for subsequence matching. The common problem occurred in post-processing of previous methods is to compare the candidate subsequence with the query sequence for discarding false alarms whenever each candidate subsequence appears during index searching. This makes a sequence containing candidate subsequences to be accessed multiple times from disk, and also have a candidate subsequence to be compared with the query sequence multiple times. These redundancies cause the performance of subsequence matching to degrade seriously. In this paper, we propose a new optimal method for resolving the problem. The proposed method stores ail the candidate subsequences returned by index searching into a binary search tree, and performs post-processing in a batch fashion after finishing the index searching. By this method, we are able to completely eliminate the redundancies mentioned above. For verifying the performance improvement effect of the proposed method, we perform extensive experiments using a real-life stock data set. The results reveal that the proposed method achieves 55 times to 156 times speedup over the previous methods.

Efficient Processing of Subsequence Searching in Sequence Databases (시퀀스 데이터베이스를 위한 서브시퀀스 탐색의 효율적인 처리)

  • Park, Sang-Hyun;Kim, Sang-Wook;Park, Jeong-Il
    • Journal of Industrial Technology
    • /
    • v.21 no.A
    • /
    • pp.155-166
    • /
    • 2001
  • This paper deals with the subsequence searching problem under time-warping. Our work is motivated by the observation that subsequence searches slow down quadratically as the average length of data sequences increases. To resolve this problem, the Segment-Based Approach for Subsequence Searches (SBASS) is proposed. The SBASS divides data and query sequences into a series of segments, and retrieves all data subsequences. Our segmentation scheme allows segments to have different lengths; thus we employ the time warping distance as a similarity measure for each segment pair. For efficient retrieval of similar subsequences, we extract feature vectors from all data segments exploiting their monotonically changing properties, and build a spatial index using feature vectors. The effectiveness of our approach is verified through extensive experiments.

  • PDF

Optimization of Subsequence Matching Under Time-Warping in Time-Series Databases (시계열 데이터베이스에서 타임 워핑 하의 서브시퀀스 매칭의 성능 최적화)

  • Kim, Man-Soon;Kim, Sang-Wook
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2004.05a
    • /
    • pp.117-120
    • /
    • 2004
  • 본 논문에서는 시계열 데이터베이스에서 타임 워핑 하의 서브시퀀스 매칭을 효과적으로 처리하는 방안에 관하여 논의한다. 타임 워핑은 데이터베이스내 시퀀스들의 길이가 서로 다른 경우에도 유사한 패턴을 갖는 시퀀스들을 찾을 수 있도록 해 준다. 본 논문에서는 타임 워핑 하의 서브시퀀스 매칭을 위한 기존의 기본 처리 방식인 Naive-Scan의 CPU 처리 과정을 최적화하는 새로운 기법을 제안한다. 제안된 기법은 질의 시퀀스와 서브시퀀스들 간의 타임 워핑 거리들을 계산하는 과정에서 발생하는 중복 작업들을 사전에 제거함으로써 CPU 처리 성능을 극대화한다. 제안된 기법이 착오 기각을 발생시키지 않음과 Naive-Scan을 처리하기 위한 최적의 기법임을 이론적으로 규명한다. 또한, 다양한 실험을 통한 성능 평가에 의하여 제안된 최적화 기법이 가져오는 성능 개선 효과를 정량적으로 검증한다. 아울러, 제안된 기법이 기존의 여과 단계를 포함하는 방식인 LB-Scan과 ST-Filter의 후처리 단계에도 성공적으로 적용될 수 있음을 보인다.

  • PDF

An Index-Based Approach for Subsequence Matching Under Time Warping in Sequence Databases (시퀀스 데이터베이스에서 타임 워핑을 지원하는 효과적인 인덱스 기반 서브시퀀스 매칭)

  • Park, Sang-Hyeon;Kim, Sang-Uk;Jo, Jun-Seo;Lee, Heon-Gil
    • The KIPS Transactions:PartD
    • /
    • v.9D no.2
    • /
    • pp.173-184
    • /
    • 2002
  • This paper discuss an index-based subsequence matching that supports time warping in large sequence databases. Time warping enables finding sequences with similar patterns even when they are of different lengths. In earlier work, Kim et al. suggested an efficient method for whole matching under time warping. This method constructs a multidimensional index on a set of feature vectors, which are invariant to time warping, from data sequences. For filtering at feature space, it also applies a lower-bound function, which consistently underestimates the time warping distance as well as satisfies the triangular inequality. In this paper, we incorporate the prefix-querying approach based on sliding windows into the earlier approach. For indexing, we extract a feature vector from every subsequence inside a sliding window and construct a multidimensional index using a feature vector as indexing attributes. For query processing, we perform a series of index searches using the feature vectors of qualifying query prefixes. Our approach provides effective and scalable subsequence matching even with a large volume of a database. We also prove that our approach does not incur false dismissal. To verify the superiority of our approach, we perform extensive experiments. The results reveal that our approach achieves significant speedup with real-world S&P 500 stock data and with very large synthetic data.

A Subsequence Matching Technique that Supports Time Warping Efficiently (타임 워핑을 지원하는 효율적인 서브시퀀스 매칭 기법)

  • Park, Sang-Hyun;Kim, Sang-Wook;Cho, June-Suh;Lee, Hoen-Gil
    • Journal of Industrial Technology
    • /
    • v.21 no.A
    • /
    • pp.167-179
    • /
    • 2001
  • This paper discusses an index-based subsequence matching that supports time warping in large sequence databases. Time warping enables finding sequences with similar patterns even when they are of different lengths. In earlier work, we suggested an efficient method for whole matching under time warping. This method constructs a multidimensional index on a set of feature vectors, which are invariant to time warping, from data sequences. For filtering at feature space, it also applies a lower-bound function, which consistently underestimates the time warping distance as well as satisfies the triangular inequality. In this paper, we incorporate the prefix-querying approach based on sliding windows into the earlier approach. For indexing, we extract a feature vector from every subsequence inside a sliding window and construct a multi-dimensional index using a feature vector as indexing attributes. For query precessing, we perform a series of index searches using the feature vectors of qualifying query prefixes. Our approach provides effective and scalable subsequence matching even with a large volume of a database. We also prove that our approach does not incur false dismissal. To verily the superiority of our method, we perform extensive experiments. The results reseal that our method achieves significant speedup with real-world S&P 500 stock data and with very large synthetic data.

  • PDF

Privacy-Preserving Clustering on Time-Series Data Using Fourier Magnitudes (시계열 데이타 클러스터링에서 푸리에 진폭 기반의 프라이버시 보호)

  • Kim, Hea-Suk;Moon, Yang-Sae
    • Journal of KIISE:Databases
    • /
    • v.35 no.6
    • /
    • pp.481-494
    • /
    • 2008
  • In this paper we propose Fourier magnitudes based privacy preserving clustering on time-series data. The previous privacy-preserving method, called DFT coefficient method, has a critical problem in privacy-preservation itself since the original time-series data may be reconstructed from privacy-preserved data. In contrast, the proposed DFT magnitude method has an excellent characteristic that reconstructing the original data is almost impossible since it uses only DFT magnitudes except DFT phases. In this paper, we first explain why the reconstruction is easy in the DFT coefficient method, and why it is difficult in the DFT magnitude method. We then propose a notion of distance-order preservation which can be used both in estimating clustering accuracy and in selecting DFT magnitudes. Degree of distance-order preservation means how many time-series preserve their relative distance orders before and after privacy-preserving. Using this degree of distance-order preservation we present greedy strategies for selecting magnitudes in the DFT magnitude method. That is, those greedy strategies select DFT magnitudes to maximize the degree of distance-order preservation, and eventually we can achieve the relatively high clustering accuracy in the DFT magnitude method. Finally, we empirically show that the degree of distance-order preservation is an excellent measure that well reflects the clustering accuracy. In addition, experimental results show that our greedy strategies of the DFT magnitude method are comparable with the DFT coefficient method in the clustering accuracy. These results indicate that, compared with the DFT coefficient method, our DFT magnitude method provides the excellent degree of privacy-preservation as well as the comparable clustering accuracy.

Shape-Based Subsequence Retrieval Supporting Multiple Models in Time-Series Databases (시계열 데이터베이스에서 복수의 모델을 지원하는 모양 기반 서브시퀀스 검색)

  • Won, Jung-Im;Yoon, Jee-Hee;Kim, Sang-Wook;Park, Sang-Hyun
    • The KIPS Transactions:PartD
    • /
    • v.10D no.4
    • /
    • pp.577-590
    • /
    • 2003
  • The shape-based retrieval is defined as the operation that searches for the (sub) sequences whose shapes are similar to that of a query sequence regardless of their actual element values. In this paper, we propose a similarity model suitable for shape-based retrieval and present an indexing method for supporting the similarity model. The proposed similarity model enables to retrieve similar shapes accurately by providing the combination of various shape-preserving transformations such as normalization, moving average, and time warping. Our indexing method stores every distinct subsequence concisely into the disk-based suffix tree for efficient and adaptive query processing. We allow the user to dynamically choose a similarity model suitable for a given application. More specifically, we allow the user to determine the parameter p of the distance function $L_p$ when submitting a query. The result of extensive experiments revealed that our approach not only successfully finds the subsequences whose shapes are similar to a query shape but also significantly outperforms the sequence search.