• Title/Summary/Keyword: Time-Series clustering

Search Result 185, Processing Time 0.025 seconds

Clustering Analysis with Spring Discharge Data and Evaluation of Groundwater System in Jeju Island (용천수 유출량 클러스터링 해석을 이용한 제주도 지하수 순환 해석)

  • Kim Tae-Hui;Mun Deok-Cheol;Park Won-Bae;Park Gi-Hwa;Go Gi-Won
    • Proceedings of the Korean Society of Soil and Groundwater Environment Conference
    • /
    • 2005.04a
    • /
    • pp.296-299
    • /
    • 2005
  • Time series of spring discharge data in Jeju island can provide abundant information on the spatial groundwater system. In this study, the classification based on time series of spring discharge was performed with clustering analysis: discharge rate and EC. Peak discharges are mainly observed in august or september. However, double peaks and late peaks of discharge are also observed at a plenty of springs. Based on results of clustering analysis, it can be deduced that GH model is not appropriate for the conceptual model of Groundwater system in Jeju island. EC distributions in dry season are also support the conclusion.

  • PDF

3D Radar Objects Tracking and Reflectivity Profiling

  • Kim, Yong Hyun;Lee, Hansoo;Kim, Sungshin
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.12 no.4
    • /
    • pp.263-269
    • /
    • 2012
  • The ability to characterize feature objects from radar readings is often limited by simply looking at their still frame reflectivity, differential reflectivity and differential phase data. In many cases, time-series study of these objects' reflectivity profile is required to properly characterize features objects of interest. This paper introduces a novel technique to automatically track multiple 3D radar structures in C,S-band in real-time using Doppler radar and profile their characteristic reflectivity distribution in time series. The extraction of reflectivity profile from different radar cluster structures is done in three stages: 1. static frame (zone-linkage) clustering, 2. dynamic frame (evolution-linkage) clustering and 3. characterization of clusters through time series profile of reflectivity distribution. The two clustering schemes proposed here are applied on composite multi-layers CAPPI (Constant Altitude Plan Position Indicator) radar data which covers altitude range of 0.25 to 10 km and an area spanning over hundreds of thousands $km^2$. Discrete numerical simulations show the validity of the proposed technique and that fast and accurate profiling of time series reflectivity distribution for deformable 3D radar structures is achievable.

Unsupervised Clustering of Multivariate Time Series Microarray Experiments based on Incremental Non-Gaussian Analysis

  • Ng, Kam Swee;Yang, Hyung-Jeong;Kim, Soo-Hyung;Kim, Sun-Hee;Anh, Nguyen Thi Ngoc
    • International Journal of Contents
    • /
    • v.8 no.1
    • /
    • pp.23-29
    • /
    • 2012
  • Multiple expression levels of genes obtained using time series microarray experiments have been exploited effectively to enhance understanding of a wide range of biological phenomena. However, the unique nature of microarray data is usually in the form of large matrices of expression genes with high dimensions. Among the huge number of genes presented in microarrays, only a small number of genes are expected to be effective for performing a certain task. Hence, discounting the majority of unaffected genes is the crucial goal of gene selection to improve accuracy for disease diagnosis. In this paper, a non-Gaussian weight matrix obtained from an incremental model is proposed to extract useful features of multivariate time series microarrays. The proposed method can automatically identify a small number of significant features via discovering hidden variables from a huge number of features. An unsupervised hierarchical clustering representative is then taken to evaluate the effectiveness of the proposed methodology. The proposed method achieves promising results based on predictive accuracy of clustering compared to existing methods of analysis. Furthermore, the proposed method offers a robust approach with low memory and computation costs.

Determining on Model-based Clusters of Time Series Data (시계열데이터의 모델기반 클러스터 결정)

  • Jeon, Jin-Ho;Lee, Gye-Sung
    • The Journal of the Korea Contents Association
    • /
    • v.7 no.6
    • /
    • pp.22-30
    • /
    • 2007
  • Most real word systems such as world economy, stock market, and medical applications, contain a series of dynamic and complex phenomena. One of common methods to understand these systems is to build a model and analyze the behavior of the system. In this paper, we investigated methods for best clustering over time series data. As a first step for clustering, BIC (Bayesian Information Criterion) approximation is used to determine the number of clusters. A search technique to improve clustering efficiency is also suggested by analyzing the relationship between data size and BIC values. For clustering, two methods, model-based and similarity based methods, are analyzed and compared. A number of experiments have been performed to check its validity using real data(stock price). BIC approximation measure has been confirmed that it suggests best number of clusters through experiments provided that the number of data is relatively large. It is also confirmed that the model-based clustering produces more reliable clustering than similarity based ones.

Privacy-Preserving Clustering on Time-Series Data Using Fourier Magnitudes (시계열 데이타 클러스터링에서 푸리에 진폭 기반의 프라이버시 보호)

  • Kim, Hea-Suk;Moon, Yang-Sae
    • Journal of KIISE:Databases
    • /
    • v.35 no.6
    • /
    • pp.481-494
    • /
    • 2008
  • In this paper we propose Fourier magnitudes based privacy preserving clustering on time-series data. The previous privacy-preserving method, called DFT coefficient method, has a critical problem in privacy-preservation itself since the original time-series data may be reconstructed from privacy-preserved data. In contrast, the proposed DFT magnitude method has an excellent characteristic that reconstructing the original data is almost impossible since it uses only DFT magnitudes except DFT phases. In this paper, we first explain why the reconstruction is easy in the DFT coefficient method, and why it is difficult in the DFT magnitude method. We then propose a notion of distance-order preservation which can be used both in estimating clustering accuracy and in selecting DFT magnitudes. Degree of distance-order preservation means how many time-series preserve their relative distance orders before and after privacy-preserving. Using this degree of distance-order preservation we present greedy strategies for selecting magnitudes in the DFT magnitude method. That is, those greedy strategies select DFT magnitudes to maximize the degree of distance-order preservation, and eventually we can achieve the relatively high clustering accuracy in the DFT magnitude method. Finally, we empirically show that the degree of distance-order preservation is an excellent measure that well reflects the clustering accuracy. In addition, experimental results show that our greedy strategies of the DFT magnitude method are comparable with the DFT coefficient method in the clustering accuracy. These results indicate that, compared with the DFT coefficient method, our DFT magnitude method provides the excellent degree of privacy-preservation as well as the comparable clustering accuracy.

Categorical time series clustering: Case study of Korean pro-baseball data (범주형 시계열 자료의 군집화: 프로야구 자료의 사례 연구)

  • Pak, Ro Jin
    • Journal of the Korean Data and Information Science Society
    • /
    • v.27 no.3
    • /
    • pp.621-627
    • /
    • 2016
  • A certain professional baseball team tends to be very weak against another particular team. For example, S team, the strongest team in Korea, is relatively weak to H team. In this paper, we carried out clustering the Korean baseball teams based on the records against the team S to investigate whether the pattern of the record of the team H is different from those of the other teams. The technique we have employed is 'time series clustering', or more specifically 'categorical time series clustering'. Three methods have been considered in this paper: (i) distance based method, (ii) genetic sequencing method and (iii) periodogram method. Each method has its own advantages and disadvantages to handle categorical time series, so that it is recommended to draw conclusion by considering the results from the above three methods altogether in a comprehensive manner.

Noise Averaging Effect on Privacy-Preserving Clustering of Time-Series Data (시계열 데이터의 프라이버시 보호 클러스터링에서 노이즈 평준화 효과)

  • Moon, Yang-Sae;Kim, Hea-Suk
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.16 no.3
    • /
    • pp.356-360
    • /
    • 2010
  • Recently, there have been many research efforts on privacy-preserving data mining. In privacy-preserving data mining, accuracy preservation of mining results is as important as privacy preservation. Random perturbation privacy-preserving data mining technique is known to well preserve privacy. However, it has a problem that it destroys distance orders among time-series. In this paper, we propose a notion of the noise averaging effect of piecewise aggregate approximation(PAA), which can be preserved the clustering accuracy as high as possible in time-series data clustering. Based on the noise averaging effect, we define the PAA distance in computing distance. And, we show that our PAA distance can alleviate the problem of destroying distance orders in random perturbing time series.

A Pattern Consistency Index for Detecting Heterogeneous Time Series in Clustering Time Course Gene Expression Data (시간경로 유전자 발현자료의 군집분석에서 이질적인 시계열의 탐지를 위한 패턴일치지수)

  • Son, Young-Sook;Baek, Jang-Sun
    • The Korean Journal of Applied Statistics
    • /
    • v.18 no.2
    • /
    • pp.371-379
    • /
    • 2005
  • In this paper, we propose a pattern consistency index for detecting heterogeneous time series that deviate from the representative pattern of each cluster in clustering time course gene expression data using the Pearson correlation coefficient. We examine its usefulness by applying this index to serum time course gene expression data from microarrays.

Efficient Time-Series Similarity Measurement and Ranking Based on Anomaly Detection (이상탐지 기반의 효율적인 시계열 유사도 측정 및 순위화)

  • Ji-Hyun Choi;Hyun Ahn
    • Journal of Internet Computing and Services
    • /
    • v.25 no.2
    • /
    • pp.39-47
    • /
    • 2024
  • Time series analysis is widely employed by many organizations to solve business problems, as it extracts various information and insights from chronologically ordered data. Among its applications, measuring time series similarity is a step to identify time series with similar patterns, which is very important in time series analysis applications such as time series search and clustering. In this study, we propose an efficient method for measuring time series similarity that focuses on anomalies rather than the entire series. In this regard, we validate the proposed method by measuring and analyzing the rank correlation between the similarity measure for the set of subsets extracted by anomaly detection and the similarity measure for the whole time series. Experimental results, especially with stock time series data and an anomaly proportion of 10%, demonstrate a Spearman's rank correlation coefficient of up to 0.9. In conclusion, the proposed method can significantly reduce computation cost of measuring time series similarity, while providing reliable time series search and clustering results.

Radial basis function network design for chaotic time series prediction (혼돈 시계열의 예측을 위한 Radial Basis 함수 회로망 설계)

  • 신창용;김택수;최윤호;박상희
    • The Transactions of the Korean Institute of Electrical Engineers
    • /
    • v.45 no.4
    • /
    • pp.602-611
    • /
    • 1996
  • In this paper, radial basis function networks with two hidden layers, which employ the K-means clustering method and the hierarchical training, are proposed for improving the short-term predictability of chaotic time series. Furthermore the recursive training method of radial basis function network using the recursive modified Gram-Schmidt algorithm is proposed for the purpose. In addition, the radial basis function networks trained by the proposed training methods are compared with the X.D. He A Lapedes's model and the radial basis function network by nonrecursive training method. Through this comparison, an improved radial basis function network for predicting chaotic time series is presented. (author). 17 refs., 8 figs., 3 tabs.

  • PDF