• 제목/요약/키워드: Time Series Clustering

검색결과 185건 처리시간 0.028초

정보이론 관점에서 본 서울시 지역구간의 미세먼지 영향력 재조명 (A Reexamination on the Influence of Fine-particle between Districts in Seoul from the Perspective of Information Theory)

  • 이재구;이태훈;윤성로
    • 정보과학회 컴퓨팅의 실제 논문지
    • /
    • 제21권2호
    • /
    • pp.109-114
    • /
    • 2015
  • 본 논문에서는 서울시에 속하는 25개의 지역구로부터 측정된 미세먼지 시계열(time series) 정보의 상관도를 정보이론(information theory)의 엔트로피(entropy)로 정량화하고, 이를 그래프로 표현하는 서울시 지역구 미세먼지 전이 모델을 만들어 지역별 유사성과 영향력을 분석하는 방법을 제안한다. 먼저, 각각의 미세먼지 농도 시계열을 가지는 지역구의 모든 쌍마다 전이 엔트로피(transfer entropy)를 계산하여 그래프의 노드간 연결 강도를 구한다. 이 그래프에 전통적인 커뮤니티 검출(community detection) 기법인 모듈성 기반 군집화(on modularity-based clustering) 알고리즘을 적용하여 전체 지역구들에 생성되는 커뮤니티를 검출하였다. 이를 통해 지역적인 근접 정도가 높은 지역과 차량 이동이 많은 지역 간의 미세 먼지 전이성이 높은 것을 확인하였으며, 더불어 제안된 방법은 기존 미세먼지의 기상모델 분석과 다른 정보이론 관점에서의 새로운 미세먼지 분석 방법의 고찰 및 향상된 미세먼지 분석 자료 생성에 활용될 것으로 기대된다.

시계열 풍속벡터의 유사성을 이용한 포항지역 바람권역 분류 (Classification of Wind Sector in Pohang Region Using Similarity of Time-Series Wind Vectors)

  • 김현구;김진솔;강용혁;박형동
    • 한국태양에너지학회 논문집
    • /
    • 제36권1호
    • /
    • pp.11-18
    • /
    • 2016
  • The local wind systems in the Pohang region were categorized into wind sectors. Still, thorough knowledge of wind resource assessment, wind environment analysis, and atmospheric environmental impact assessment was required since the region has outstanding wind resources, it is located on the path of typhoon, and it has large-scale atmospheric pollution sources. To overcome the resolution limitation of meteorological dataset and problems of categorization criteria of the preceding studies, the high-resolution wind resource map of the Korea Institute of Energy Research was used as time-series meteorological data; the 2-step method of determining the clustering coefficient through hierarchical clustering analysis and subsequently categorizing the wind sectors through non-hierarchical K-means clustering analysis was adopted. The similarity of normalized time-series wind vector was proposed as the Euclidean distance. The meteor-statistical characteristics of the mean vector wind distribution and meteorological variables of each wind sector were compared. The comparison confirmed significant differences among wind sectors according to the terrain elevation, mean wind speed, Weibull shape parameter, etc.

DETECTING VARIABILITY IN ASTRONOMICAL TIME SERIES DATA: APPLICATIONS OF CLUSTERING METHODS IN CLOUD COMPUTING ENVIRONMENTS

  • 신민수;변용익;장서원;김대원;김명진;이동욱;함재균;정용환;윤준연;곽재혁;김주현
    • 천문학회보
    • /
    • 제36권2호
    • /
    • pp.131.1-131.1
    • /
    • 2011
  • We present applications of clustering methods to detect variability in massive astronomical time series data. Focusing on variability of bright stars, we use clustering methods to separate possible variable sources from other time series data, which include intrinsically non-variable sources and data with common systematic patterns. We already finished the analysis of the Northern Sky Variability Survey data, which include about 16 million light curves, and present candidate variable sources with their association to other data at different wavelengths. We also apply our clustering method to the light curves of bright objects in the SuperWASP Data Release 1. For the analysis of the SuperWASP data, we exploit a elastically configurable Cloud computing environments that the KISTI Supercomputing Center is deploying. Two quite different configurations are incorporated in our Cloud computing test bed. One system uses the Hadoop distributed processing with its distributed file system, using distributed processing with data locality condition. Another one adopts the Condor and the Lustre network file system. We present test results, considering performance of processing a large number of light curves, and finding clusters of variable and non-variable objects.

  • PDF

RCGKA를 이용한 최적 퍼지 예측 시스템 설계 (Design of the Optimal Fuzzy Prediction Systems using RCGKA)

  • 방영근;심재선;이철희
    • 산업기술연구
    • /
    • 제29권B호
    • /
    • pp.9-15
    • /
    • 2009
  • In the case of traditional binary encoding technique, it takes long time to converge the optimal solutions and brings about complexity of the systems due to encoding and decoding procedures. However, the ROGAs (real-coded genetic algorithms) do not require these procedures, and the k-means clustering algorithm can avoid global searching space. Thus, this paper proposes a new approach by using their advantages. The proposed method constructs the multiple predictors using the optimal differences that can reveal the patterns better and properties concealed in non-stationary time series where the k-means clustering algorithm is used for data classification to each predictor, then selects the best predictor. After selecting the best predictor, the cluster centers of the predictor are tuned finely via RCGKA in secondary tuning procedure. Therefore, performance of the predictor can be more enhanced. Finally, we verifies the prediction performance of the proposed system via simulating typical time series examples.

  • PDF

반도체공정 이상탐지 및 클러스터링을 위한 심볼릭 표현법의 적용 (Application of Symbolic Representation Method for Fault Detection and Clustering in Semiconductor Fabrication Processes)

  • 노웅기;홍상진
    • 한국정보과학회논문지:컴퓨팅의 실제 및 레터
    • /
    • 제15권11호
    • /
    • pp.806-818
    • /
    • 2009
  • 반도체(semiconductor) 기술은 1950년대에 집적 회로(integrated circuit, IC)가 발명된 이후 오늘날까지 급속한 발전을 거듭하고 있다. 하나의 완전한 반도체를 제조하기 위해서는 매우 다양하고 긴 공정을 거쳐야 한다. 반도체 제조 생산성을 높이기 위하여 공정들이 종료되기 전에 미리 이상(fault)을 발견하기 위한 이상탐지 및 분류(fault detection and classification, FDC)에 대한 많은 연구가 진행되고 있다. 이를 위하여 다양한 반도체 장비에 갖가지 종류의 센서를 부착하여 일정한 시간 간격으로 원하는 값을 측정한다. 이러한 측정 값은 실수 값들의 연속이므로 시계열(time-series) 데이터의 일종이다. 본 논문에서는 반도체 공정에서의 이상탐지 및 클러스터링을 수행하는 알고리즘을 제안한다. 제안된 알고리즘은 시계열 데이터를 심볼릭 표현법(symbolic representation)으로 변환하여 이상을 탐지하는 기존의 알고리즘을 수정한 것이다. 본 논문의 공헌은 일반적인 시계열 데이터에 대한 기존의 이상탐지 알고리즘을 수정하여 반도체 공정 데이터에 대해서도 활용할 수 있음을 보일 뿐만 아니라, 이상탐지 및 클러스터링의 정확성을 높이는 실험 결과를 제시하는 것이다. 실험 결과, 본 논문에서 제안한 알고리즘은 긍정 오류(false positive) 및 부정 오류(false negative)를 모두 발생하지 않았다.

Volatility clustering in data breach counts

  • Shim, Hyunoo;Kim, Changki;Choi, Yang Ho
    • Communications for Statistical Applications and Methods
    • /
    • 제27권4호
    • /
    • pp.487-500
    • /
    • 2020
  • Insurers face increasing demands for cyber liability; entailed in part by a variety of new forms of risk of data breaches. As data breach occurrences develop, our understanding of the volatility in data breach counts has also become important as well as its expected occurrences. Volatility clustering, the tendency of large changes in a random variable to cluster together in time, are frequently observed in many financial asset prices, asset returns, and it is questioned whether the volatility of data breach occurrences are also clustered in time. We now present volatility analysis based on INGARCH models, i.e., integer-valued generalized autoregressive conditional heteroskedasticity time series model for frequency counts due to data breaches. Using the INGARCH(1, 1) model with data breach samples, we show evidence of temporal volatility clustering for data breaches. In addition, we present that the firms' volatilities are correlated between some they belong to and that such a clustering effect remains even after excluding the effect of financial covariates such as the VIX and the stock return of S&P500 that have their own volatility clustering.

Exploring COVID-19 in mainland China during the lockdown of Wuhan via functional data analysis

  • Li, Xing;Zhang, Panpan;Feng, Qunqiang
    • Communications for Statistical Applications and Methods
    • /
    • 제29권1호
    • /
    • pp.103-125
    • /
    • 2022
  • In this paper, we analyze the time series data of the case and death counts of COVID-19 that broke out in China in December, 2019. The study period is during the lockdown of Wuhan. We exploit functional data analysis methods to analyze the collected time series data. The analysis is divided into three parts. First, the functional principal component analysis is conducted to investigate the modes of variation. Second, we carry out the functional canonical correlation analysis to explore the relationship between confirmed and death cases. Finally, we utilize a clustering method based on the Expectation-Maximization (EM) algorithm to run the cluster analysis on the counts of confirmed cases, where the number of clusters is determined via a cross-validation approach. Besides, we compare the clustering results with some migration data available to the public.

다학제 분야 학술지의 주제어 동시발생 네트워크를 활용한 기술예측 연구 (A Study on Technology Forecasting based on Co-occurrence Network of Keyword in Multidisciplinary Journals)

  • 김현욱;안상진;정우성
    • 한국경영과학회지
    • /
    • 제40권4호
    • /
    • pp.49-63
    • /
    • 2015
  • Keyword indexed in multidisciplinary journals show trends about science and technology innovation. Nature and Science were selected as multidisciplinary journals for our analysis. In order to reduce the effect of plurality of keyword, stemming algorithm were implemented. After this process, we fitted growth curve of keyword (stem) following bass model, which is a well-known model in diffusion process. Bass model is useful for expressing growth pattern by assuming innovative and imitative activities in innovation spreading. In addition, we construct keyword co-occurrence network and calculate network measures such as centrality indices and local clustering coefficient. Based on network metrics and yearly frequency of keyword, time series analysis was conducted for obtaining statistical causality between these measures. For some cases, local clustering coefficient seems to Granger-cause yearly frequency of keyword. We expect that local clustering coefficient could be a supportive indicator of emerging science and technology.

최적 TS 퍼지 모델 기반 다중 모델 예측 시스템의 구현과 시계열 예측 응용 (Multiple Model Prediction System Based on Optimal TS Fuzzy Model and Its Applications to Time Series Forecasting)

  • 방영근;이철희
    • 산업기술연구
    • /
    • 제28권B호
    • /
    • pp.101-109
    • /
    • 2008
  • In general, non-stationary or chaos time series forecasting is very difficult since there exists a drift and/or nonlinearities in them. To overcome this situation, we suggest a new prediction method based on multiple model TS fuzzy predictors combined with preprocessing of time series data, where, instead of time series data, the differences of them are applied to predictors as input. In preprocessing procedure, the candidates of optimal difference interval are determined by using con-elation analysis and corresponding difference data are generated. And then, for each of them, TS fuzzy predictor is constructed by using k-means clustering algorithm and least squares method. Finally, the best predictor which minimizes the performance index is selected and it works on hereafter for prediction. Computer simulation is performed to show the effectiveness and usefulness of our method.

  • PDF