Search | Korea Science

An Index Interpolation-based Subsequence Matching Algorithm supporting Normalization Transform in Time-Series Databases (시계열 데이터베이스에서 인덱스 보간법을 기반으로 정규화 변환을 지원하는 서브시퀀스 매칭 알고리즘)

No, Ung-Gi;Kim, Sang-Uk;Hwang, Gyu-Yeong
- Journal of KIISE:Databases
- /
- v.28 no.2
- /
- pp.217-232
- /
- 2001
본 논문에서는 시계열 데이터베이스에서 정규화 변환을 지원하는 서브시퀀스 매칭 알고리즘을 제안한다. 정규화 변환을 시계열 데이터 간의 절대적인 유클리드 거리에 관계 없이, 구성하는 값들의 상대적인 변화 추이가 유사한 패턴을 갖는 시계열 데이터를 검색하는 데에 유용하다. 기존의 서브시퀀스 매칭 알고리즘을 확장 없이 정규화 변환 서브시퀀스 매칭에 단순히 응용할 경우, 질의 결과로 반환되어야 할 서부시퀀스를 모두 찾아내지 못하는 착오 기각이 발생한다. 또한, 정규화 변환을 지원하는 기존의 전체 매칭 알고리즘의 경우, 모든 가능한 질의 시퀀스 길이 각각에 대하여 하나씩의 인덱스를 생성하여야 하므로, 저장 공간 및 데이터 시퀀스 삽입/삭제의 부담이 매우 심각하다. 본 논문에서는 인덱스 보간법을 이용하여 문제를 해결한다. 인덱스 보간법은 인덱스가 요구되는 모든 경우 중에서 적당한 간격의 일부에 대해서만 생성된 인덱스를 이용하며, 인덱스가 필요한 모든 경우에 대한 탐색을 수행하는 기법이다. 제안된 알고리즘은 몇 개의 질의 시퀀스 길이에 대해서만 각각 인덱스를 생성한 후, 이를 이용하여 모든 가능한 길이의 질의 시퀀스에 대해서 탐색을 수행한다. 이때, 착오 기각이 발생하지 않음을 증명한다. 제안된 알고리즘은 질의 시에 주어진 질의 시퀀스의 길이에 따라 생성되어 있는 인덱스 중에서 가장 적절한 것을 선택하여 탐색을 수행한다. 이때, 생성되어 있는 인덱스의 개수가 많을수록 탐색 성능이 향상된다. 필요에 따라 인덱스의 개수를 변화함으로써 탐색 성능과 저장 공간 간의 비율을 유연하게 조정할 수 있다. 질의 시퀀스의 길이 256 ~ 512중 다섯 개의 길이에 대해 인덱스를 생성하여 실험한 결과, 탐색 결과 선택률이 $10^{-2}$일 때 제안된 알고리즘의 탐색 성능이 순차 검색에 비하여 평균 2.40배, 선택률이 $10^{-5}$일 때 평균 14.6배 개선되었다. 제안된 알고리즘의 탐색 성능은 탐색 결과 선택률이 작아질수록 더욱 향상되므로, 실제 데이터베이스 응용에서의 효용성이 높다고 판단된다.
PDF

A Hybrid System of Wavelet Transformations and Neural Networks Using Genetic Algorithms: Applying to Chaotic Financial Markets (유전자알고리즘을 이용한 웨이블릿분석 및 인공신경망기법의 통합모형구축)

Shin, Taeksoo;Han, Ingoo
- Proceedings of the Korea Database Society Conference
- /
- 1999.06a
- /
- pp.271-280
- /
- 1999
인공신경망을 시계열예측에 적용하는 경우에 고려되어야 할 문제중, 특히 모형에 적합한 입력변수의 생성이 중요시되고 있는데, 이러한 분야는 인공신경망의 모형생성과정에서 입력변수에 대한 전처리기법으로써 다양하게 제시되어 왔다. 가장 최근의 입력변수 전처리기법으로써 제시되고 있는 신호처리기법은 전통적 주기분할처리방법인 푸리에변환기법(Fourier transforms)을 비롯하여 이를 확장시킨 개념인 웨이블릿변환기법(wavelet transforms) 등으로 대별될 수 있다. 이는 기본적으로 시계열이 다수의 주기(cycle)들로 구성된 상이한 시계열들의 집합이라는 가정에서 출발하고 있다. 전통적으로 이러한 시계열은 전기 또는 전자공학에서 주파수영역분할, 즉 고주파 및 저주파수를 분할하기 위한 기법에 적용되어 왔다. 그러나, 최근에는 이러한 연구가 다양한 분야에 활발하게 응용되기 시작하였으며, 그 중의 대표적인 예가 바로 경영분야의 재무시계열에 대한 분석이다 전통적으로 재무시계열은 장, 단기의사결정을 가진 시장참여자들간의 거래특성이 시계열에 각기 달리 가격으로 반영되기 때문에 이러한 상이한 집단들의 고유한 거래움직임으로 말미암아 예를 들어, 주식시장이 프랙탈구조를 가지고 있다고 보기도 한다. 이처럼 재무시계열은 다양한 사회현상의 집합체라고 볼 수 있으며, 그만큼 예측모형을 구축하는데 어려움이 따른다. 본 연구는 이러한 시계열의 주기적 특성에 기반을 둔 신호처리분석으로서 기존의 시계열로부터 노이즈를 줄여 주면서 보다 의미 있는 정보로 변환시켜 줄 수 있는 웨이블릿분석 방법론을 새로운 필터링기법으로 사용하여 현재 많은 연구가 진행되고 있는 인공신경망과의 모형결합을 통해 기존연구와는 다른 새로운 통합예측방법론을 제시하고자 한다. 본 연구에서 제시하는 통합방법론은 크게 2단계 과정을 거쳐 예측모형으로 완성이 된다. 즉, 1차 모형단계에서 원시 재무시계열은 먼저 웨이블릿분석을 통해서 노이즈가 필터링 되는 동시에, 과거 재무시계열의 프랙탈 구조, 즉 비선형적인 움직임을 보다 잘 반영시켜 주는 다차원 주기요소를 가지는 시계열로 분해, 생성되며, 이렇게 주기에 따라 장단기로 분할된 시계열들은 2차 모형단계에서 신경망의 새로운 입력변수로서 사용되어 최종적인 인공 신경망모델을 구축하는 데 반영된다.
PDF

The Performance Bottleneck of Subsequence Matching in Time-Series Databases: Observation, Solution, and Performance Evaluation (시계열 데이타베이스에서 서브시퀀스 매칭의 성능 병목 : 관찰, 해결 방안, 성능 평가)

김상욱
- Journal of KIISE:Databases
- /
- v.30 no.4
- /
- pp.381-396
- /
- 2003
Subsequence matching is an operation that finds subsequences whose changing patterns are similar to a given query sequence from time-series databases. This paper points out the performance bottleneck in subsequence matching, and then proposes an effective method that improves the performance of entire subsequence matching significantly by resolving the performance bottleneck. First, we analyze the disk access and CPU processing times required during the index searching and post processing steps through preliminary experiments. Based on their results, we show that the post processing step is the main performance bottleneck in subsequence matching, and them claim that its optimization is a crucial issue overlooked in previous approaches. In order to resolve the performance bottleneck, we propose a simple but quite effective method that processes the post processing step in the optimal way. By rearranging the order of candidate subsequences to be compared with a query sequence, our method completely eliminates the redundancy of disk accesses and CPU processing occurred in the post processing step. We formally prove that our method is optimal and also does not incur any false dismissal. We show the effectiveness of our method by extensive experiments. The results show that our method achieves significant speed-up in the post processing step 3.91 to 9.42 times when using a data set of real-world stock sequences and 4.97 to 5.61 times when using data sets of a large volume of synthetic sequences. Also, the results show that our method reduces the weight of the post processing step in entire subsequence matching from about 90% to less than 70%. This implies that our method successfully resolves th performance bottleneck in subsequence matching. As a result, our method provides excellent performance in entire subsequence matching. The experimental results reveal that it is 3.05 to 5.60 times faster when using a data set of real-world stock sequences and 3.68 to 4.21 times faster when using data sets of a large volume of synthetic sequences compared with the previous one.
PDF KSCI

Privacy-Preserving Clustering on Time-Series Data Using Fourier Magnitudes (시계열 데이타 클러스터링에서 푸리에 진폭 기반의 프라이버시 보호)

Kim, Hea-Suk;Moon, Yang-Sae
- Journal of KIISE:Databases
- /
- v.35 no.6
- /
- pp.481-494
- /
- 2008
In this paper we propose Fourier magnitudes based privacy preserving clustering on time-series data. The previous privacy-preserving method, called DFT coefficient method, has a critical problem in privacy-preservation itself since the original time-series data may be reconstructed from privacy-preserved data. In contrast, the proposed DFT magnitude method has an excellent characteristic that reconstructing the original data is almost impossible since it uses only DFT magnitudes except DFT phases. In this paper, we first explain why the reconstruction is easy in the DFT coefficient method, and why it is difficult in the DFT magnitude method. We then propose a notion of distance-order preservation which can be used both in estimating clustering accuracy and in selecting DFT magnitudes. Degree of distance-order preservation means how many time-series preserve their relative distance orders before and after privacy-preserving. Using this degree of distance-order preservation we present greedy strategies for selecting magnitudes in the DFT magnitude method. That is, those greedy strategies select DFT magnitudes to maximize the degree of distance-order preservation, and eventually we can achieve the relatively high clustering accuracy in the DFT magnitude method. Finally, we empirically show that the degree of distance-order preservation is an excellent measure that well reflects the clustering accuracy. In addition, experimental results show that our greedy strategies of the DFT magnitude method are comparable with the DFT coefficient method in the clustering accuracy. These results indicate that, compared with the DFT coefficient method, our DFT magnitude method provides the excellent degree of privacy-preservation as well as the comparable clustering accuracy.
PDF KSCI

Noise Control Boundary Image Matching Using Time-Series Moving Average Transform (시계열 이동평균 변환을 이용한 노이즈 제어 윤곽선 이미지 매칭)

Kim, Bum-Soo;Moon, Yang-Sae;Kim, Jin-Ho
- Journal of KIISE:Databases
- /
- v.36 no.4
- /
- pp.327-340
- /
- 2009
To achieve the noise reduction effect in boundary image matching, we use the moving average transform of time-series matching. Our motivation is based on an intuition that using the moving average transform we may exploit the noise reduction effect in boundary image matching as in time-series matching. To confirm this simple intuition, we first propose $\kappa$-order image matching, which applies the moving average transform to boundary image matching. A boundary image can be represented as a sequence in the time-series domain, and our $\kappa$-order image matching identifies similar images in this time-series domain by comparing the $\kappa$-moving average transformed sequences. Next, we propose an index-based matching method that efficiently performs $\kappa$-order image matching on a large volume of image databases, and formally prove the correctness of the index-based method. Moreover, we formally analyze the relationship between an order $\kappa$ and its matching result, and present a systematic way of controlling the noise reduction effect by changing the order $\kappa$. Experimental results show that our $\kappa$-order image matching exploits the noise reduction effect, and our index-based matching method outperforms the sequential scan by one or two orders of magnitude.
PDF KSCI

Incremental Regression based on a Sliding Window for Stream Data Prediction (스트림 데이타 예측을 위한 슬라이딩 윈도우 기반 점진적 회귀분석)

Kim, Sung-Hyun;Jin, Long;Ryu, Keun-Ho
- Journal of KIISE:Databases
- /
- v.34 no.6
- /
- pp.483-492
- /
- 2007
Time series of conventional prediction techniques uses the model which is generated from the training step. This model is applied to new input data without any change. If this model is applied directly to stream data, the rate of prediction accuracy will be decreased. This paper proposes an stream data prediction technique using sliding window and regression. This technique considers the characteristic of time series which may be changed over time. It is composed of two steps. The first step executes a fractional process for applying input data to the regression model. The second step updates the model by using its information as new data. Additionally, the model is maintained by only recent data in a queue. This approach has the following two advantages. It maintains the minimum information of the model by using a matrix, so space complexity is reduced. Moreover, it prevents the increment of error rate by updating the model over time. Accuracy rate of the proposed method is measured by RME(Relative Mean Error) and RMSE(Root Mean Square Error). The results of stream data prediction experiment are performed by the proposed technique IMQR(Incremental Multiple Quadratic Regression) is more efficient than those of MLR(Multiple Linear Regression) and SVR(Support Vector Regression).
PDF KSCI

Similarity Search in Time Series Databases based on the Normalized Distance (정규 거리에 기반한 시계열 데이터베이스의 유사 검색 기법)

이상준;이석호
- Journal of KIISE:Databases
- /
- v.31 no.1
- /
- pp.23-29
- /
- 2004
In this paper, we propose a search method for time sequences which supports the normalized distance as a similarity measure. In many applications where the shape of the time sequence is a major consideration, the normalized distance is a more suitable similarity measure than the simple Lp distance. To support normalized distance queries, most of the previous work has the preprocessing step for vertical shifting which normalizes each sequence by its mean. The proposed method is motivated by the property of sequence for feature extraction. That is, the variation between two adjacent elements of a time sequence is invariant under vertical shifting. The extracted feature is indexed by the spatial access method such as R-tree. The proposed method can match time series of similar shape without vertical shifting and guarantees no false dismissals. The experiments are performed on real data(stock price movement) to verify the performance of the proposed method.
PDF KSCI

A Single Index Approach for Time-Series Subsequence Matching that Supports Moving Average Transform of Arbitrary Order (단일 색인을 사용한 임의 계수의 이동평균 변환 지원 시계열 서브시퀀스 매칭)

Moon Yang-Sae;Kim Jinho
- Journal of KIISE:Databases
- /
- v.33 no.1
- /
- pp.42-55
- /
- 2006
We propose a single Index approach for subsequence matching that supports moving average transform of arbitrary order in time-series databases. Using the single index approach, we can reduce both storage space overhead and index maintenance overhead. Moving average transform is known to reduce the effect of noise and has been used in many areas such as econometrics since it is useful in finding overall trends. However, the previous research results have a problem of occurring index overhead both in storage space and in update maintenance since tile methods build several indexes to support arbitrary orders. In this paper, we first propose the concept of poly-order moving average transform, which uses a set of order values rather than one order value, by extending the original definition of moving average transform. That is, the poly-order transform makes a set of transformed windows from each original window since it transforms each window not for just one order value but for a set of order values. We then present theorems to formally prove the correctness of the poly-order transform based subsequence matching methods. Moreover, we propose two different subsequence matching methods supporting moving average transform of arbitrary order by applying the poly-order transform to the previous subsequence matching methods. Experimental results show that, for all the cases, the proposed methods improve performance significantly over the sequential scan. For real stock data, the proposed methods improve average performance by 22.4${\~}$33.8 times over the sequential scan. And, when comparing with the cases of building each index for all moving average orders, the proposed methods reduce the storage space required for indexes significantly by sacrificing only a little performance degradation(when we use 7 orders, the methods reduce the space by up to 1/7.0 while the performance degradation is only $9\%{\~}42\%$ on the average). In addition to the superiority in performance, index space, and index maintenance, the proposed methods have an advantage of being generalized to many sorts of other transforms including moving average transform. Therefore, we believe that our work can be widely and practically used in many sort of transform based subsequence matching methods.
PDF KSCI

연구개발투자의 산업간 파급효과

김정우;이희경
- Proceedings of the Korean Operations and Management Science Society Conference
- /
- 1995.09a
- /
- pp.144-155
- /
- 1995
본 연구는 기술에 대한 대용개념으로 사용되고 있는 연구개발투자의 효과가 산업의 생산성 향상에 얼마만큼 기여하고 있는가에 관한 실증연구로, 그 효 과를 자체 연구개발효과와 파급효과로 나누어 측정하는 데 목적이 있다. 파 급효과의 경우, 중간재의 거래를 통한 체화된 파급효과와 산업간의 기술거리 로 인한 비체화된 파급효과로 나누어 한국 제조산업을 18개로 분류한 후 각 산업의 연구개발스톡을 측정하였으며, 연구개발투자의 체화된 파급효과 측정 을 위하여 산업 연관표를 이용하여 가중치를 계산하였다. 그리고, 비체화된 파급효과 측정을 위하여는 각 산업에서 고용하고 있는 전공별 연구원 수의 자료를 이용 기술거리를 구하였다. 본 연구에서는 각각의 가중치로 구한 연 구개발스톡, 체화된 연구개발 스톡, 그리고 비체화된 연구개발 스톡을 이용 하여 각 독립변수들에 대한 한계생산성을 구하였으며, 분석 방법으로는 단순 회귀분석과 함께 시계열의 효과와 산업간 효과를 고려하는 패널데이터 분석 을 시도하였다. 체화된 파급효과와 비체화된 파급효과 중 하나만을 변수로 포함하는 경우에는 추정치가 유의한 결과를 나타내고 있지만, 두 가지의 변 수를 모두 포함하는 경우에는 보호도 일정하지 않으며 비유의적인 결과를 보였다. 이러한 결과는 다중공선성에 의한 것으로 보인다. 두 가지 파급효과 에 대한 한계생산성 추정치는 기술과 연구개발투자가 외부성을 가지고 있으 며, 기술과 관련된 변수의 도입이 필요함을 시사한다. 또한 이러한 파급효과 의 추정치는 거시차원에서 연구개발 지원의 정당성에 대한 근거를 제시하고 있으며, 기술혁신을 위한 투자의 타당성을 실증적으로 보여주고 있다.사하였다. 이 사례 연구들의 결과는 각 계열사들의 상황에 따라 제시된 외주위탁 전략과 현재의 외주위 탁 전략이 일치할 때 정보 시스템에 대한 사용자 만족도가 보다 높은 것으 로 나타났다. 할 수 있는 효율적인 distributed system를 개발하는 것을 제시하였다. 본 논문은 데이타베이스론의 입장에서 아직 정립되어 있지 않은 분산 환경하에서의 관계형 데이타베이스의 데이타관리의 분류체계를 나름대로 정립하였다는데 그 의의가 있다. 또한 이것의 응용은 현재 분산데이타베이스 구축에 있어 나타나는 기술적인 문제점들을 어느정도 보완할 수 있다는 점에서 그 중요성이 있다.ence of a small(IxEpc),hot(Tex> SOK) core which contains two tempegatlue peaks at -15" east and north of MDS. The column density of HCaN is (1-3):n1014cm-2. Column density at distant position from MD5 is larger than that in the (:entral region. We have deduced that this hot-core has a mass of 10sR1 which i:s about an order of magnitude larger those obtained by previous studies.previous studies.업순서들의 상관관계를 고려하여 보다 개선된 해를 구하기 위한 연구가 요구된다. 또한, 준비작업비용을 발생시키는 작업장의 작업순서결정에 대해서도 연구를 행하여, 보완작업비용과 준비비용을
PDF

Efficient Time-Series Subsequence Matching Using MBR-Safe Property of Piecewise Aggregation Approximation (부분 집계 근사법의 MBR-안전 성질을 이용한 효율적인 시계열 서브시퀀스 매칭)

Moon, Yang-Sae
- Journal of KIISE:Databases
- /
- v.34 no.6
- /
- pp.503-517
- /
- 2007
In this paper we address the MBR-safe property of Piecewise Aggregation Approximation(PAA), and propose an of efficient subsequence matching method based on the MBR-safe PAA. A transformation is said to be MBR-safe if a low-dimensional MBR to which a high- dimensional MBR is transformed by the transformation contains every individual low-dimensional sequence to which a high-dimensional sequence is transformed. Using an MBR-safe transformation we can reduce the number of lower-dimensional transformations required in similar sequence matching, since it transforms a high-dimensional MBR itself to a low-dimensional MBR directly. Furthermore, PAA is known as an excellent lower-dimensional transformation single its computation is very simple, and its performance is superior to other transformations. Thus, to integrate these advantages of PAA and MBR-safeness, we first formally confirm the MBR-safe property of PAA, and then improve subsequence matching performance using the MBR-safe PAA. Contributions of the paper can be summarized as follows. First, we propose a PAA-based MBR-safe transformation, called mbrPAA, and formally prove the MBR-safeness of mbrPAA. Second, we propose an mbrPAA-based subsequence matching method, and formally prove its correctness of the proposed method. Third, we present the notion of entry reuse property, and by using the property, we propose an efficient method of constructing high-dimensional MBRs in subsequence matching. Fourth, we show the superiority of mbrPAA through extensive experiments. Experimental results show that, compared with the previous approach, our mbrPAA is 24.2 times faster in the low-dimensional MBR construction and improves subsequence matching performance by up to 65.9%.
PDF KSCI

Search Result 25, Processing Time 0.027 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)