[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.29220/CSAM.2020.27.6.589

Comparison of time series clustering methods and application to power consumption pattern clustering

Kim, Jaehwi (Korea Rural Economic Institute)
Kim, Jaehee (Department of Statistics, Duksung Women's University)

Publication Information

Communications for Statistical Applications and Methods / v.27, no.6, 2020 , pp. 589-602 More about this Journal

Abstract

The development of smart grids has enabled the easy collection of a large amount of power data. There are some common patterns that make it useful to cluster power consumption patterns when analyzing s power big data. In this paper, clustering analysis is based on distance functions for time series and clustering algorithms to discover patterns for power consumption data. In clustering, we use 10 distance measures to find the clusters that consider the characteristics of time series data. A simulation study is done to compare the distance measures for clustering. Cluster validity measures are also calculated and compared such as error rate, similarity index, Dunn index and silhouette values. Real power consumption data are used for clustering, with five distance measures whose performances are better than others in the simulation.

Keywords

complexity distance; model-free distance; model-based distance; power consumption; time series clustering; silhouette;

Citations & Related Records

Reference

1	Chouakria AD and Nagabhushan PN (2007). Adaptive dissimilarity index for measuring time series proximity, Advances in Data Analysis and Classification, 1, 5-21. DOI
2	Dunn J (1974). Well separated clusters and optimal fuzzy partitions, Journal of Cybernetics, 4, 95-104. DOI
3	D'Urso P and Maharaj EA (2009). Autocorrelation-based fuzzy clustering of time series, Fuzzy Sets and Systems, 160, 3565-3589. DOI
4	Eiter T and Mannila H (1994) Computing discrete frechet distance (Technical Report CD-TR 94/64), Information Systems Department, Technical University of Vienna, Vienna, Austria.
5	Frechet MM (1906). Sur quelques points du calcul fonctionnel, Rendiconti del Circolo Matematico di Palermo (1884-1940), 22, 1-72. DOI
6	Galeano P and Pena D (2000). Multivariate analysis in vector time series, Department de Estadistica y Econometria, Universidad Carlos III de Madrid, Working Paper 01-24 Statistics and Econometrics Series 15.
7	Golay X, Kollias S, Stoll G, Meier D, Valavanis A, and Boesiger P (1998). A new correlation-based fuzzy logic clustering algorithm for fMRI, Magnetic Resonance in Medicine, 40, 249-260. DOI
8	Haben S, Singleton C, and Grindrod P (2015). Analysis and clustering of residential customers energy behavioral demand using smart meter data, IEEE Transactions on Smart Grid, 7, 136-144. DOI
9	Kalpakis K, Gada D, and Puttagunta V (2001). Distance measures for effective clustering of ARIMA time-series. In Proceedings 2001 IEEE International Conference on Data Mining, 273-280.
10	Keogh E, Lonardi S, and Ratanamahatana CA (2004). Towards parameter-free data mining. In Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 206-215.
11	Keogh E, Lonardi S, Ratanamahatana CA, Wei L, Lee SH, and Handley J (2007). Compression-based data mining of sequential data, Data Mining and Knowledge Discovery, 14, 99-129. DOI
12	Li M, Chen X, Li X, Ma B, and Vitanyi PM (2004). The similarity metric, IEEE Transactions on Information Theory, 50, 3250-3264. DOI
13	Liao TW (2005). Clustering of time series data-a survey, Pattern Recognition, 38, 1857-1874. DOI
14	Maharaj EA (1996). A significance test for classifying ARMA models, Journal of Statistical Computation and Simulation, 54, 305-331. DOI
15	Maharaj EA (2000). Cluster of time series, Journal of Classification, 17, 297-314. DOI
16	Montero P and Vilar JA (2014). TSclust: An R package for time series clustering, Journal of Statistical Software, 62, 1-43.
17	Moritz S and Bartz-Beielstein T (2017). imputeTS: time series missing value imputation in R, The R Journal, 9, 207-218. DOI
18	Piccolo D (1990). A distance measure for classifying ARIMA models, Journal of Time Series Analysis, 11, 153-164. DOI
19	Rousseeuw PJ (1987). Silhouettes: graphical aid to the interpretation and validation of cluster analysis, Journal of Computation and Applied Mathematics, 20, 53-65. DOI
20	Serban N and Wasserman L (2005). CATS: clustering after transformation and smoothing, Journal of American Statistical Association, 471, 990-999. DOI
21	Stineman RW (1980). A consistently well-behaved method for interpolation, Creative Computing, 6, 54-57.
22	Tsekouras GJ, Hatziargyriou ND, and Dialynas EN (2007). Two-stage pattern recognition of load curves for classification of electricity customers, IEEE Transactions on Power Systems, 22, 1120-1128. DOI
23	Wang X, Smith K, and Hyndman R (2006). Characteristic-based clustering for time series data, Data Mining and Knowledge Discovery, 13, 335-364. DOI
24	Xiong Y and Yeung DY (2004). Time series clustering with ARMA mixtures, Pattern Recognition, 37, 1675-1689. DOI
25	Caiado J, Crato N, and Pena D (2006). A periodogram-based metric for time series classification, Computational Statistics & Data Analysis, 50, 2668-2684. DOI
26	Al-Jhrrah OY, Al-Hammadi Y, and Muhaidat S (2017). Multi-layered clustering for power consumption profiling in smart grids, IEEE Access, Digital Object Identifier/ACCESS.2017.2712258.
27	Batista GE, Wang X, and Keogh EJ (2011). A complexity-invariant distance measure for time series. In Proceedings of the 2011 SIAM International Conference on Data Mining, Society for Industrial and Applied Mathematics, 699-710.
28	Bohte Z, Cepar D, and Kosmelj K (1980). Clustering of time series. In Compstat 1980: Proceeding in Computational Statistics, (MM Barritt, D Wishart (eds), 587-593), Physica-Verlag, Heidelberg.