• Title/Summary/Keyword: Time-series clustering

Search Result 185, Processing Time 0.029 seconds

A Reexamination on the Influence of Fine-particle between Districts in Seoul from the Perspective of Information Theory (정보이론 관점에서 본 서울시 지역구간의 미세먼지 영향력 재조명)

  • Lee, Jaekoo;Lee, Taehoon;Yoon, Sungroh
    • KIISE Transactions on Computing Practices
    • /
    • v.21 no.2
    • /
    • pp.109-114
    • /
    • 2015
  • This paper presents a computational model on the transfer of airborne fine particles to analyze the similarities and influences among the 25 districts in Seoul by quantifying a time series data collected from each district. The properties of each district are driven with the model of a time series of the fine particle concentrations, and the calculation of edge-based weights are carried out with the transfer entropies between all pairs of the districts. We applied a modularity-based graph clustering technique to detect the communities among the 25 districts. The result indicates the discovered clusters correspond to a high transfer-entropy group among the communities with geographical adjacency or high in-between traffic volumes. We believe that this approach can be further extended to the discovery of significant flows of other indicators causing environmental pollution.

Classification of Wind Sector in Pohang Region Using Similarity of Time-Series Wind Vectors (시계열 풍속벡터의 유사성을 이용한 포항지역 바람권역 분류)

  • Kim, Hyun-Goo;Kim, Jinsol;Kang, Yong-Heack;Park, Hyeong-Dong
    • Journal of the Korean Solar Energy Society
    • /
    • v.36 no.1
    • /
    • pp.11-18
    • /
    • 2016
  • The local wind systems in the Pohang region were categorized into wind sectors. Still, thorough knowledge of wind resource assessment, wind environment analysis, and atmospheric environmental impact assessment was required since the region has outstanding wind resources, it is located on the path of typhoon, and it has large-scale atmospheric pollution sources. To overcome the resolution limitation of meteorological dataset and problems of categorization criteria of the preceding studies, the high-resolution wind resource map of the Korea Institute of Energy Research was used as time-series meteorological data; the 2-step method of determining the clustering coefficient through hierarchical clustering analysis and subsequently categorizing the wind sectors through non-hierarchical K-means clustering analysis was adopted. The similarity of normalized time-series wind vector was proposed as the Euclidean distance. The meteor-statistical characteristics of the mean vector wind distribution and meteorological variables of each wind sector were compared. The comparison confirmed significant differences among wind sectors according to the terrain elevation, mean wind speed, Weibull shape parameter, etc.

DETECTING VARIABILITY IN ASTRONOMICAL TIME SERIES DATA: APPLICATIONS OF CLUSTERING METHODS IN CLOUD COMPUTING ENVIRONMENTS

  • Shin, Min-Su;Byun, Yong-Ik;Chang, Seo-Won;Kim, Dae-Won;Kim, Myung-Jin;Lee, Dong-Wook;Ham, Jae-Gyoon;Jung, Yong-Hwan;Yoon, Jun-Weon;Kwak, Jae-Hyuck;Kim, Joo-Hyun
    • The Bulletin of The Korean Astronomical Society
    • /
    • v.36 no.2
    • /
    • pp.131.1-131.1
    • /
    • 2011
  • We present applications of clustering methods to detect variability in massive astronomical time series data. Focusing on variability of bright stars, we use clustering methods to separate possible variable sources from other time series data, which include intrinsically non-variable sources and data with common systematic patterns. We already finished the analysis of the Northern Sky Variability Survey data, which include about 16 million light curves, and present candidate variable sources with their association to other data at different wavelengths. We also apply our clustering method to the light curves of bright objects in the SuperWASP Data Release 1. For the analysis of the SuperWASP data, we exploit a elastically configurable Cloud computing environments that the KISTI Supercomputing Center is deploying. Two quite different configurations are incorporated in our Cloud computing test bed. One system uses the Hadoop distributed processing with its distributed file system, using distributed processing with data locality condition. Another one adopts the Condor and the Lustre network file system. We present test results, considering performance of processing a large number of light curves, and finding clusters of variable and non-variable objects.

  • PDF

Design of the Optimal Fuzzy Prediction Systems using RCGKA (RCGKA를 이용한 최적 퍼지 예측 시스템 설계)

  • Bang, Young-Keun;Shim, Jae-Son;Lee, Chul-Heui
    • Journal of Industrial Technology
    • /
    • v.29 no.B
    • /
    • pp.9-15
    • /
    • 2009
  • In the case of traditional binary encoding technique, it takes long time to converge the optimal solutions and brings about complexity of the systems due to encoding and decoding procedures. However, the ROGAs (real-coded genetic algorithms) do not require these procedures, and the k-means clustering algorithm can avoid global searching space. Thus, this paper proposes a new approach by using their advantages. The proposed method constructs the multiple predictors using the optimal differences that can reveal the patterns better and properties concealed in non-stationary time series where the k-means clustering algorithm is used for data classification to each predictor, then selects the best predictor. After selecting the best predictor, the cluster centers of the predictor are tuned finely via RCGKA in secondary tuning procedure. Therefore, performance of the predictor can be more enhanced. Finally, we verifies the prediction performance of the proposed system via simulating typical time series examples.

  • PDF

Application of Symbolic Representation Method for Fault Detection and Clustering in Semiconductor Fabrication Processes (반도체공정 이상탐지 및 클러스터링을 위한 심볼릭 표현법의 적용)

  • Loh, Woong-Kee;Hong, Sang-Jeen
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.15 no.11
    • /
    • pp.806-818
    • /
    • 2009
  • Since the invention of the integrated circuit (IC) in 1950s, semiconductor technology has undergone dramatic development up to these days. A complete semiconductor is manufactured through a diversity of processes. For better semiconductor productivity, fault detection and classification (FDC) has been rigorously studied for finding faults even before the processes are completed. For FDC, various kinds of sensors are attached in many semiconductor manufacturing devices, and sensor values are collected in a periodic manner. The collection of sensor values consists of sequences of real numbers, and hence is regarded as a kind of time-series data. In this paper, we propose an algorithm for detecting and clustering faults in semiconductor processes. The proposed algorithm is a modification of the existing anomaly detection algorithm dealing with symbolically-represented time-series. The contributions of this paper are: (1) showing that a modification of the existing anomaly detection algorithm dealing with general time-series could be used for semiconductor process data and (2) presenting experimental results for improving correctness of fault detection and clustering. As a result of our experiment, the proposed algorithm caused neither false positive nor false negative.

Volatility clustering in data breach counts

  • Shim, Hyunoo;Kim, Changki;Choi, Yang Ho
    • Communications for Statistical Applications and Methods
    • /
    • v.27 no.4
    • /
    • pp.487-500
    • /
    • 2020
  • Insurers face increasing demands for cyber liability; entailed in part by a variety of new forms of risk of data breaches. As data breach occurrences develop, our understanding of the volatility in data breach counts has also become important as well as its expected occurrences. Volatility clustering, the tendency of large changes in a random variable to cluster together in time, are frequently observed in many financial asset prices, asset returns, and it is questioned whether the volatility of data breach occurrences are also clustered in time. We now present volatility analysis based on INGARCH models, i.e., integer-valued generalized autoregressive conditional heteroskedasticity time series model for frequency counts due to data breaches. Using the INGARCH(1, 1) model with data breach samples, we show evidence of temporal volatility clustering for data breaches. In addition, we present that the firms' volatilities are correlated between some they belong to and that such a clustering effect remains even after excluding the effect of financial covariates such as the VIX and the stock return of S&P500 that have their own volatility clustering.

Exploring COVID-19 in mainland China during the lockdown of Wuhan via functional data analysis

  • Li, Xing;Zhang, Panpan;Feng, Qunqiang
    • Communications for Statistical Applications and Methods
    • /
    • v.29 no.1
    • /
    • pp.103-125
    • /
    • 2022
  • In this paper, we analyze the time series data of the case and death counts of COVID-19 that broke out in China in December, 2019. The study period is during the lockdown of Wuhan. We exploit functional data analysis methods to analyze the collected time series data. The analysis is divided into three parts. First, the functional principal component analysis is conducted to investigate the modes of variation. Second, we carry out the functional canonical correlation analysis to explore the relationship between confirmed and death cases. Finally, we utilize a clustering method based on the Expectation-Maximization (EM) algorithm to run the cluster analysis on the counts of confirmed cases, where the number of clusters is determined via a cross-validation approach. Besides, we compare the clustering results with some migration data available to the public.

Multiple Model Prediction System Based on Optimal TS Fuzzy Model and Its Applications to Time Series Forecasting (최적 TS 퍼지 모델 기반 다중 모델 예측 시스템의 구현과 시계열 예측 응용)

  • Bang, Young-Keun;Lee, Chul-Heui
    • Journal of Industrial Technology
    • /
    • v.28 no.B
    • /
    • pp.101-109
    • /
    • 2008
  • In general, non-stationary or chaos time series forecasting is very difficult since there exists a drift and/or nonlinearities in them. To overcome this situation, we suggest a new prediction method based on multiple model TS fuzzy predictors combined with preprocessing of time series data, where, instead of time series data, the differences of them are applied to predictors as input. In preprocessing procedure, the candidates of optimal difference interval are determined by using con-elation analysis and corresponding difference data are generated. And then, for each of them, TS fuzzy predictor is constructed by using k-means clustering algorithm and least squares method. Finally, the best predictor which minimizes the performance index is selected and it works on hereafter for prediction. Computer simulation is performed to show the effectiveness and usefulness of our method.

  • PDF

Identifying Temporal Pattern Clusters to Predict Events in Time Series

  • Heesoo Hwang
    • KIEE International Transaction on Systems and Control
    • /
    • v.2D no.2
    • /
    • pp.125-134
    • /
    • 2002
  • This paper proposes a method for identifying temporal pattern clusters to predict events in time series. Instead of predicting future values of the time series, the proposed method forecasts specific events that may be arbitrarily defined by the user. The prediction is defined by an event characterization function, which is the target of prediction. The events are predicted when the time series belong to temporal pattern clusters. To identify the optimal temporal pattern clusters, fuzzy goal programming is formulated to combine multiple objectives and solved by an adaptive differential evolution technique that can overcome the sensitivity problem of control parameters in conventional differential evolution. To evaluate the prediction method, five test examples are considered. The adaptive differential evolution is also tested for twelve optimization problems.

  • PDF