• Title/Summary/Keyword: 시계열 군집분석

Search Result 85, Processing Time 0.034 seconds

Clustering and classification to characterize daily electricity demand (시간단위 전력사용량 시계열 패턴의 군집 및 분류분석)

  • Park, Dain;Yoon, Sanghoo
    • Journal of the Korean Data and Information Science Society
    • /
    • v.28 no.2
    • /
    • pp.395-406
    • /
    • 2017
  • The purpose of this study is to identify the pattern of daily electricity demand through clustering and classification. The hourly data was collected by KPS (Korea Power Exchange) between 2008 and 2012. The time trend was eliminated for conducting the pattern of daily electricity demand because electricity demand data is times series data. We have considered k-means clustering, Gaussian mixture model clustering, and functional clustering in order to find the optimal clustering method. The classification analysis was conducted to understand the relationship between external factors, day of the week, holiday, and weather. Data was divided into training data and test data. Training data consisted of external factors and clustered number between 2008 and 2011. Test data was daily data of external factors in 2012. Decision tree, random forest, Support vector machine, and Naive Bayes were used. As a result, Gaussian model based clustering and random forest showed the best prediction performance when the number of cluster was 8.

A study on electricity demand forecasting based on time series clustering in smart grid (스마트 그리드에서의 시계열 군집분석을 통한 전력수요 예측 연구)

  • Sohn, Hueng-Goo;Jung, Sang-Wook;Kim, Sahm
    • The Korean Journal of Applied Statistics
    • /
    • v.29 no.1
    • /
    • pp.193-203
    • /
    • 2016
  • This paper forecasts electricity demand as a critical element of a demand management system in Smart Grid environment. We present a prediction method of using a combination of predictive values by time series clustering. Periodogram-based normalized clustering, predictive analysis clustering and dynamic time warping (DTW) clustering are proposed for time series clustering methods. Double Seasonal Holt-Winters (DSHW), Trigonometric, Box-Cox transform, ARMA errors, Trend and Seasonal components (TBATS), Fractional ARIMA (FARIMA) are used for demand forecasting based on clustering. Results show that the time series clustering method provides a better performances than the method using total amount of electricity demand in terms of the Mean Absolute Percentage Error (MAPE).

A Study on the Response Plan by Station Area Cluster through Time Series Analysis of Urban Rail Riders Before and After COVID-19 (COVID-19 전후 도시철도 승차인원 시계열 군집분석을 통한 역세권 군집별 대응방안 고찰)

  • Li, Cheng Xi;Jung, Hun Young
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.43 no.3
    • /
    • pp.363-370
    • /
    • 2023
  • Due to the spread of COVID-19, the use of public transportation such as urban railroads has changed significantly since the beginning of 2020. Therefore, in this study, daily time series data for each urban railway station were collected for three years before COVID-19 and after the spread of COVID-19, and the similarity of time series analysis was evaluated through DTW (Dynamic Time Warping) distance method to derive regression centers for each cluster, and the effect of various external events such as COVID-19 on changes in the number of users was diagnosed as a time series impact detection function. In addition, the characteristics of use by cluster of urban railway stations were analyzed, and the change in passenger volume due to external shocks was identified. The purpose was to review measures for the maintenance and recovery of usage in the event of re-proliferation of COVID-19.

Comparison Study of Time Series Clustering Methods (시계열자료 눈집방법의 비교연구)

  • Hong, Han-Woom;Park, Min-Jeong;Cho, Sin-Sup
    • The Korean Journal of Applied Statistics
    • /
    • v.22 no.6
    • /
    • pp.1203-1214
    • /
    • 2009
  • In this paper we introduce the time series clustering methods in the time and frequency domains and discuss the merits or demerits of each method. We analyze 15 daily stock prices of KOSPI 200, and the nonparametric method using the wavelet shows the best clustering results. For the clustering of nonstationary time series using the spectral density, the EMD method remove the trend more effectively than the differencing.

Clustering Performance Analysis for Time Series Data: Wavelet vs. Autoencoder (시계열 데이터에 대한 클러스터링 성능 분석: Wavelet과 Autoencoder 비교)

  • Hwang, Woosung;Lim, Hyo-Sang
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2018.10a
    • /
    • pp.585-588
    • /
    • 2018
  • 시계열 데이터의 특징을 추출하여 분석하는 과정에서 시게열 데이터가 가지는 고차원성은 차원의 저주(Course of Dimensionality)로 인해 데이터내의 유효한 정보를 찾는데 어려움을 만든다. 이러한 문제를 해결하기 위해 차원 축소 기법(dimensionality reduction)이 널리 사용되고 있지만, 축소 과정에서 발생하는 정보의 희석으로 인하여 시계열 데이터에 대한 군집화(clustering)등을 수행하는데 있어서 성능의 변화를 가져온다. 본 논문은 이러한 현상을 관찰하기 위해 이산 웨이블릿 변환(Discrete Wavelet Transform:DWT)과 오토 인코더(AutoEncoder)를 차원 축소 기법으로 활용하여 시계열 데이터의 차원을 압축 한 뒤, 압축된 데이터를 K-평균(K-means) 알고리즘에 적용하여 군집화의 효율성을 비교하였다. 성능 비교 결과, DWT는 압축된 차원수 그리고 오토인코더는 시계열 데이터에 대한 충분한 학습이 각각 보장된다면 좋은 군집화 성능을 보이는 것을 확인하였다.

Categorical time series clustering: Case study of Korean pro-baseball data (범주형 시계열 자료의 군집화: 프로야구 자료의 사례 연구)

  • Pak, Ro Jin
    • Journal of the Korean Data and Information Science Society
    • /
    • v.27 no.3
    • /
    • pp.621-627
    • /
    • 2016
  • A certain professional baseball team tends to be very weak against another particular team. For example, S team, the strongest team in Korea, is relatively weak to H team. In this paper, we carried out clustering the Korean baseball teams based on the records against the team S to investigate whether the pattern of the record of the team H is different from those of the other teams. The technique we have employed is 'time series clustering', or more specifically 'categorical time series clustering'. Three methods have been considered in this paper: (i) distance based method, (ii) genetic sequencing method and (iii) periodogram method. Each method has its own advantages and disadvantages to handle categorical time series, so that it is recommended to draw conclusion by considering the results from the above three methods altogether in a comprehensive manner.

Nonparametric clustering of functional time series electricity consumption data (전기 사용량 시계열 함수 데이터에 대한 비모수적 군집화)

  • Kim, Jaehee
    • The Korean Journal of Applied Statistics
    • /
    • v.32 no.1
    • /
    • pp.149-160
    • /
    • 2019
  • The electricity consumption time series data of 'A' University from July 2016 to June 2017 is analyzed via nonparametric functional data clustering since the time series data can be regarded as realization of continuous functions with dependency structure. We use a Bouveyron and Jacques (Advances in Data Analysis and Classification, 5, 4, 281-300, 2011) method based on model-based functional clustering with an FEM algorithm that assumes a Gaussian distribution on functional principal components. Clusterwise analysis is provided with cluster mean functions, densities and cluster profiles.

Evolutionary Computation-based Hybird Clustring Technique for Manufacuring Time Series Data (제조 시계열 데이터를 위한 진화 연산 기반의 하이브리드 클러스터링 기법)

  • Oh, Sanghoun;Ahn, Chang Wook
    • Smart Media Journal
    • /
    • v.10 no.3
    • /
    • pp.23-30
    • /
    • 2021
  • Although the manufacturing time series data clustering technique is an important grouping solution in the field of detecting and improving manufacturing large data-based equipment and process defects, it has a disadvantage of low accuracy when applying the existing static data target clustering technique to time series data. In this paper, an evolutionary computation-based time series cluster analysis approach is presented to improve the coherence of existing clustering techniques. To this end, first, the image shape resulting from the manufacturing process is converted into one-dimensional time series data using linear scanning, and the optimal sub-clusters for hierarchical cluster analysis and split cluster analysis are derived based on the Pearson distance metric as the target of the transformation data. Finally, by using a genetic algorithm, an optimal cluster combination with minimal similarity is derived for the two cluster analysis results. And the performance superiority of the proposed clustering is verified by comparing the performance with the existing clustering technique for the actual manufacturing process image.

A Pattern Consistency Index for Detecting Heterogeneous Time Series in Clustering Time Course Gene Expression Data (시간경로 유전자 발현자료의 군집분석에서 이질적인 시계열의 탐지를 위한 패턴일치지수)

  • Son, Young-Sook;Baek, Jang-Sun
    • The Korean Journal of Applied Statistics
    • /
    • v.18 no.2
    • /
    • pp.371-379
    • /
    • 2005
  • In this paper, we propose a pattern consistency index for detecting heterogeneous time series that deviate from the representative pattern of each cluster in clustering time course gene expression data using the Pearson correlation coefficient. We examine its usefulness by applying this index to serum time course gene expression data from microarrays.

Efficient Time-Series Similarity Measurement and Ranking Based on Anomaly Detection (이상탐지 기반의 효율적인 시계열 유사도 측정 및 순위화)

  • Ji-Hyun Choi;Hyun Ahn
    • Journal of Internet Computing and Services
    • /
    • v.25 no.2
    • /
    • pp.39-47
    • /
    • 2024
  • Time series analysis is widely employed by many organizations to solve business problems, as it extracts various information and insights from chronologically ordered data. Among its applications, measuring time series similarity is a step to identify time series with similar patterns, which is very important in time series analysis applications such as time series search and clustering. In this study, we propose an efficient method for measuring time series similarity that focuses on anomalies rather than the entire series. In this regard, we validate the proposed method by measuring and analyzing the rank correlation between the similarity measure for the set of subsets extracted by anomaly detection and the similarity measure for the whole time series. Experimental results, especially with stock time series data and an anomaly proportion of 10%, demonstrate a Spearman's rank correlation coefficient of up to 0.9. In conclusion, the proposed method can significantly reduce computation cost of measuring time series similarity, while providing reliable time series search and clustering results.