• Title/Summary/Keyword: time-series databases

Search Result 86, Processing Time 0.027 seconds

Optimal Construction of Multiple Indexes for Time-Series Subsequence Matching (시계열 서브시퀀스 매칭을 위한 최적의 다중 인덱스 구성 방안)

  • Lim, Seung-Hwan;Kim, Sang-Wook;Park, Hee-Jin
    • Journal of KIISE:Databases
    • /
    • v.33 no.2
    • /
    • pp.201-213
    • /
    • 2006
  • A time-series database is a set of time-series data sequences, each of which is a list of changing values of the object in a given period of time. Subsequence matching is an operation that searches for such data subsequences whose changing patterns are similar to a query sequence from a time-series database. This paper addresses a performance issue of time-series subsequence matching. First, we quantitatively examine the performance degradation caused by the window size effect, and then show that the performance of subsequence matching with a single index is not satisfactory in real applications. We argue that index interpolation is fairly useful to resolve this problem. The index interpolation performs subsequence matching by selecting the most appropriate one from multiple indexes built on windows of their inherent sizes. For index interpolation, we first decide the sites of windows for multiple indexes to be built. In this paper, we solve the problem of selecting optimal window sizes in the perspective of physical database design. For this, given a set of query sequences to be peformed in a target time-series database and a set of window sizes for building multiple indexes, we devise a formula that estimates the cost of all the subsequence matchings. Based on this formula, we propose an algorithm that determines the optimal window sizes for maximizing the performance of entire subsequence matchings. We formally Prove the optimality as well as the effectiveness of the algorithm. Finally, we perform a series of extensive experiments with a real-life stock data set and a large volume of a synthetic data set. The results reveal that the proposed approach improves the previous one by 1.5 to 7.8 times.

Selectivity Estimation for Multidimensional Sequence Data in Spatio-Temporal Databases (시공간 데이타베이스에서 다차원 시퀀스 데이타의 선택도추정)

  • Shin, Byoung-Cheol;Lee, Jong-Yun
    • Journal of KIISE:Databases
    • /
    • v.34 no.1
    • /
    • pp.84-97
    • /
    • 2007
  • Selectivity estimation techniques in query optimization have been used in commercial databases and histograms are popularly used for the selectivity estimation. Recently, the techniques for spatio-temporal databases have been restricted to existing temporal and spatial databases. In addition, the selectivity estimation techniques focused on time-series data such as moving objects. It is also impossible to estimate selectivity for range queries with a time interval. Therefore, we construct two histograms, CMH (current multidimensional histogram) and PMH (past multidimensional histogram), to estimate the selectivity of multidimensional sequence data in spatio-temporal databases and propose effective selectivity estimation methods using the histograms. Furthermore, we solve a problem about the range query using our proposed histograms. We evaluated the effectiveness of histograms for range queries with a time interval through various experimental results.

Similarity Search in Time Series Databases based on the Normalized Distance (정규 거리에 기반한 시계열 데이터베이스의 유사 검색 기법)

  • 이상준;이석호
    • Journal of KIISE:Databases
    • /
    • v.31 no.1
    • /
    • pp.23-29
    • /
    • 2004
  • In this paper, we propose a search method for time sequences which supports the normalized distance as a similarity measure. In many applications where the shape of the time sequence is a major consideration, the normalized distance is a more suitable similarity measure than the simple Lp distance. To support normalized distance queries, most of the previous work has the preprocessing step for vertical shifting which normalizes each sequence by its mean. The proposed method is motivated by the property of sequence for feature extraction. That is, the variation between two adjacent elements of a time sequence is invariant under vertical shifting. The extracted feature is indexed by the spatial access method such as R-tree. The proposed method can match time series of similar shape without vertical shifting and guarantees no false dismissals. The experiments are performed on real data(stock price movement) to verify the performance of the proposed method.

Efficient Similarity Search in Time Series Databases Based on the Minimum Distance (최단거리에 기반한 시계열 데이타의 효율적인 유사 검색)

  • 이상준;권동섭;이석호
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2003.04a
    • /
    • pp.533-535
    • /
    • 2003
  • The Euclidean distance is sensitive to the absolute offsets of time sequences, so it is not a suitable similarity measure in terms of shape. In this paper. we propose an indexing scheme for efficient matching and retrieval of time sequences based on the minimum distance. The minimum distance can give a better estimation of similarity in shape between two time sequences. Our indexing scheme can match time sequences of similar shapes irrespective of their vortical positions and guarantees no false dismissals

  • PDF

Linear Detrending Subsequence Matching in Time-Series Databases (시계열 데이터베이스에서 선형 추세 제거 서브시퀀스 매칭)

  • Gil, Myeong-Seon;Kim, Bum-Soo;Moon, Yang-Sae;Kim, Jin-Ho
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.16 no.5
    • /
    • pp.586-590
    • /
    • 2010
  • In this paper we formally define the linear detrending subsequence matching and propose its efficient index-based solution. To this end, we first present the notion of LD-windows. We eliminate the linear trend from a subsequence rather than each window itself and obtain LD-windows by dividing the subsequence into windows. Using the LD-windows we present a lower bounding theorem of the index-based solution and formally prove its correctness. Based on this lower bounding theorem, we then propose the index building and subsequence matching algorithms, respectively. Finally, we show the superiority of our index- based solution through experiments.

Automatic Detection of Congestive Heart Failure and Atrial Fibrillation with Short RR Interval Time Series

  • Yoon, Kwon-Ha;Nam, Yunyoung;Thap, Tharoeun;Jeong, Changwon;Kim, Nam Ho;Ko, Joem Seok;Noh, Se-Eung;Lee, Jinseok
    • Journal of Electrical Engineering and Technology
    • /
    • v.12 no.1
    • /
    • pp.346-355
    • /
    • 2017
  • Atrial fibrillation (AF) and Congestive heart failure (CHF) are increasingly widespread, costly, deadly diseases and are associated with significant morbidity and mortality. In this study, we analyzed three statistical methods for automatic detection of AF and CHF based on the randomness, variability and complexity of the heart beat interval, which is RRI time series. Specifically, we used short RRI time series with 16 beats and employed the normalized root mean square of successive RR differences (RMSSD), the sample entropy and the Shannon entropy. The detection performance was analyzed using four large well documented databases, namely the MIT-BIH Atrial fibrillation (n=23), the MIT-BIH Normal Sinus Rhythm (n=18), the BIDMC Congestive Heart Failure (n=13) and the Congestive Heart Failure RRI databases (n=25). Using thresholds by Receiver Operating Characteristic (ROC) curves, we found that the normalized RMSSD provided the highest accuracy. The overall sensitivity, specificity and accuracy for AF and CHF were 0.8649, 0.9331 and 0.9104, respectively. Regarding CHF detection, the detection rate of CHF (NYHA III-IV) was 0.9113 while CHF (NYHA I-II) was 0.7312, which shows that the detection rate of CHF with higher severity is higher than that of CHF with lower severity. For the clinical 24 hour data (n=42), the overall sensitivity, specificity and accuracy for AF and CHF were 0.8809, 0.9406 and 0.9108, respectively, using normalized RMSSD.

An Optimal Way to Index Searching of Duality-Based Time-Series Subsequence Matching (이원성 기반 시계열 서브시퀀스 매칭의 인덱스 검색을 위한 최적의 기법)

  • Kim, Sang-Wook;Park, Dae-Hyun;Lee, Heon-Gil
    • The KIPS Transactions:PartD
    • /
    • v.11D no.5
    • /
    • pp.1003-1010
    • /
    • 2004
  • In this paper, we address efficient processing of subsequence matching in time-series databases. We first point out the performance problems occurring in the index searching of a prior method for subsequence matching. Then, we propose a new method that resolves these problems. Our method starts with viewing the index searching of subsequence matching from a new angle, thereby regarding it as a kind of a spatial-join called a window-join. For speeding up the window-join, our method builds an R*-tree in main memory for f query sequence at starting of sub-sequence matching. Our method also includes a novel algorithm for joining effectively one R*-tree in disk, which is for data sequences, and another R*-tree in main memory, which is for a query sequence. This algorithm accesses each R*-tree page built on data sequences exactly cure without incurring any index-level false alarms. Therefore, in terms of the number of disk accesses, the proposed algorithm proves to be optimal. Also, performance evaluation through extensive experiments shows the superiority of our method quantitatively.

Index-based Boundary Matching Supporting Partial Denoising for Large Image Databases

  • Kim, Bum-Soo
    • Journal of the Korea Society of Computer and Information
    • /
    • v.24 no.10
    • /
    • pp.91-99
    • /
    • 2019
  • In this paper, we propose partial denoising boundary matching based on an index for faster matching in very large image databases. Attempts have recently been made to convert boundary images to time-series with the objective of solving the partial denoising problem in boundary matching. In this paper, we deal with the disk I/O overhead problem of boundary matching to support partial denoising in a large image database. Although the solution to the problem superficially appears trivial as it only applies indexing techniques to boundary matching, it is not trivial since multiple indexes are required for every possible denoising parameters. Our solution is an efficient index-based approach to partial denoising using $R^*-tree$ in boundary matching. The results of experiments conducted show that our index-based matching methods improve search performance by orders of magnitude.

Partial Denoising Boundary Image Matching Based on Time-Series Data (시계열 데이터 기반의 부분 노이즈 제거 윤곽선 이미지 매칭)

  • Kim, Bum-Soo;Lee, Sanghoon;Moon, Yang-Sae
    • Journal of KIISE
    • /
    • v.41 no.11
    • /
    • pp.943-957
    • /
    • 2014
  • Removing noise, called denoising, is an essential factor for the more intuitive and more accurate results in boundary image matching. This paper deals with a partial denoising problem that tries to allow a limited amount of partial noise embedded in boundary images. To solve this problem, we first define partial denoising time-series which can be generated from an original image time-series by removing a variety of partial noises and propose an efficient mechanism that quickly obtains those partial denoising time-series in the time-series domain rather than the image domain. We next present the partial denoising distance, which is the minimum distance from a query time-series to all possible partial denoising time-series generated from a data time-series, and we use this partial denoising distance as a similarity measure in boundary image matching. Using the partial denoising distance, however, incurs a severe computational overhead since there are a large number of partial denoising time-series to be considered. To solve this problem, we derive a tight lower bound for the partial denoising distance and formally prove its correctness. We also propose range and k-NN search algorithms exploiting the partial denoising distance in boundary image matching. Through extensive experiments, we finally show that our lower bound-based approach improves search performance by up to an order of magnitude in partial denoising-based boundary image matching.

Nonlinear Quality Indices Based on a Novel Lempel-Ziv Complexity for Assessing Quality of Multi-Lead ECGs Collected in Real Time

  • Zhang, Yatao;Ma, Zhenguo;Dong, Wentao
    • Journal of Information Processing Systems
    • /
    • v.16 no.2
    • /
    • pp.508-521
    • /
    • 2020
  • We compared a novel encoding Lempel-Ziv complexity (ELZC) with three common complexity algorithms i.e., approximate entropy (ApEn), sample entropy (SampEn), and classic Lempel-Ziv complexity (CLZC) so as to determine a satisfied complexity and its corresponding quality indices for assessing quality of multi-lead electrocardiogram (ECG). First, we calculated the aforementioned algorithms on six artificial time series in order to compare their performance in terms of discerning randomness and the inherent irregularity within time series. Then, for analyzing sensitivity of the algorithms to content level of different noises within the ECG, we investigated their change trend in five artificial synthetic noisy ECGs containing different noises at several signal noise ratios. Finally, three quality indices based on the ELZC of the multi-lead ECG were proposed to assess the quality of 862 real 12-lead ECGs from the MIT databases. The results showed the ELZC could discern randomness and the inherent irregularity within six artificial time series, and also reflect content level of different noises within five artificial synthetic ECGs. The results indicated the AUCs of three quality indices of the ELZC had statistical significance (>0.500). The ELZC and its corresponding three indices were more suitable for multi-lead ECG quality assessment than the other three algorithms.