DOI QR코드

DOI QR Code

Clustering Algorithm for Time Series with Similar Shapes

  • Ahn, Jungyu (Dept. of Computer Engineering, Inha University) ;
  • Lee, Ju-Hong (Dept. of Computer Engineering, Inha University)
  • Received : 2017.09.25
  • Accepted : 2018.01.23
  • Published : 2018.07.31

Abstract

Since time series clustering is performed without prior information, it is used for exploratory data analysis. In particular, clusters of time series with similar shapes can be used in various fields, such as business, medicine, finance, and communications. However, existing time series clustering algorithms have a problem in that time series with different shapes are included in the clusters. The reason for such a problem is that the existing algorithms do not consider the limitations on the size of the generated clusters, and use a dimension reduction method in which the information loss is large. In this paper, we propose a method to alleviate the disadvantages of existing methods and to find a better quality of cluster containing similarly shaped time series. In the data preprocessing step, we normalize the time series using z-transformation. Then, we use piecewise aggregate approximation (PAA) to reduce the dimension of the time series. In the clustering step, we use density-based spatial clustering of applications with noise (DBSCAN) to create a precluster. We then use a modified K-means algorithm to refine the preclusters containing differently shaped time series into subclusters containing only similarly shaped time series. In our experiments, our method showed better results than the existing method.

Keywords

References

  1. Cheng-Ping Lai, Pau-Choo Chung, Vincent S. Tseng, "A novel two-level clustering method for time series data analysis," Expert Systems with Applications, Vol. 37, pp. 6319-6326, 2010. https://doi.org/10.1016/j.eswa.2010.02.089
  2. C. Faloutsos, M. Ranganathan, Y. Manolopoulos, "Fast subsequence matching in time-series databases," ACM SIGMOD, Vol. 23, Issue. 2, pp. 419-429, 1994.
  3. Daxin Jiang, Jian Pei, Aidong Zhang, "DHC: A Density-based Hierarchical Clustering Method for Time Series Gene Expression Data," in Proc. of Proceedings of the Third IEEE Symposium on BioInformatics and BioEngineering, 2003.
  4. E. Keogh, M. Pazzani, K. Chakrabarti, S. Mehrotra, "A simple dimensionality reduction technique for fast similarity search in large time series databases," Pacific-Asia Conference on Knowledge Discovery and Data Mining., Vol. 1805, pp. 122-133., 2000.
  5. F. Korn, H.V. Jagadish, C. Faloutsos, "Efficiently supporting ad hoc queries in large datasets of time sequences," ACM SIGMOD, Vol. 26, pp. 289-300, 1997.
  6. Jiawei Han, Micheline Kamber, Jian Pei, "Data Mining Concepts and Technique 3rd," Morgan Kaufmann, 2011.
  7. J. Lin, E. Keogh, L. Wei, S. Lonardi, "Experiencing SAX: a novel symbolic representation of time series," Data Min. Knowl. Discov., Vol. 15, Issue. 2, pp. 107-144, 2007. https://doi.org/10.1007/s10618-007-0064-z
  8. K. Chan, A.W. Fu, "Efficient time series matching by wavelets," in Proc. of IEEE International Conference on Data Engineering, vol. 15, no. 3, pp. 126 - 133, 1999.
  9. Martin Ester, Hans-Peter Kriegel, Jorg Sander, Xiaowei Xu, "A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise," in Proc. of International Conference on Knowledge Discovery and Data Mining, pp. 226-231, 1996
  10. Michel Verleysen, Damien Fracois, "The Curse of Dimensionality in Data Mining and Time Series Prediction," IWANN, vol. 3512, pp. 758-770., 2005.
  11. Philippe esling and Carlos agon, "Time-Series Data Mining," ACM Computing Surveys, Vol. 45, No. 1, Article 12, 2012.
  12. Rui Ding, Qiang Wang, Yingnong Dang, Qiang Fu, Haidong Zhang, Dongmei Zhang, "YADING: Fast Clustering of Large-Scale Time Series Data," Proceedings of the VLDB Endowment, Vol 8. Issue 5, PP. 473-484, 2015.
  13. R. C. Dubes and A. K. Jain. "Algorithms for Clustering Data," Prentice Hall, 1988
  14. Saeed Aghabozorgi, Ali Seyed Shirkhorshidi, The Ying Wah, "Time series clustering - A decade review," Information Systems, Vol. 53, pp. 16-38., 2015. https://doi.org/10.1016/j.is.2015.04.007
  15. Saeed Aghabozorgi,Ying Wah The, "Stock market co-movement assessment using a three-phase clustering method," Expert Systems with Applications, Vol. 41, pp.1301-1314, 2014. https://doi.org/10.1016/j.eswa.2013.08.028
  16. Saeed Aghabozorgi, The Ying Wah, Tutut Herawan, Hamid A.Jalab, Mohammad Amin Shaygan, and Alireza Jalali, "A Hybrid Algorithm for Clustering of Time Series Data Based on Affinity Search Technique," The Scientific World Journal, Vol. 2014, 12pages, 2014.
  17. Zechao Li, Jing Liu, Yi Yang, Xiaofang Zhou, Hanqing Lu, "Clustering-Guided Sparse Structural Learning for Unsupervised Feature Selection," IEEE Transactions on Knowledge and Data Engineering, Vol. 26, Issue. 9, 2014.
  18. Zechao Li, Jing Liu, Jinhui Tang, Hanqing Lu, "Robust Structured Subspace Learning for Data Representation," IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 37, Issue. 10, pp. 2138-2150 2015.
  19. Zechao Li, Jinhui Tang, "Unsupervised Feature Selection via Nonnegative Spectral Analysis and Redundancy Control," IEEE Transactions on Image Processing, Vol. 24, Issue. 12, 2015.

Cited by

  1. Speaker Adaptation Using i-Vector Based Clustering vol.14, pp.7, 2018, https://doi.org/10.3837/tiis.2020.07.003