DOI QR코드

DOI QR Code

Time series representation for clustering using unbalanced Haar wavelet transformation

불균형 Haar 웨이블릿 변환을 이용한 군집화를 위한 시계열 표현

  • Lee, Sehun (Department of Statistics, Sungkyunkwan University) ;
  • Baek, Changryong (Department of Statistics, Sungkyunkwan University)
  • 이세훈 (성균관대학교 통계학과) ;
  • 백창룡 (성균관대학교 통계학과)
  • Received : 2018.08.08
  • Accepted : 2018.11.15
  • Published : 2018.12.31

Abstract

Various time series representation methods have been proposed for efficient time series clustering and classification. Lin et al. (DMKD, 15, 107-144, 2007) proposed a symbolic aggregate approximation (SAX) method based on symbolic representations after approximating the original time series using piecewise local mean. The performance of SAX therefore depends heavily on how well the piecewise local averages approximate original time series features. SAX equally divides the entire series into an arbitrary number of segments; however, it is not sufficient to capture key features from complex, large-scale time series data. Therefore, this paper considers data-adaptive local constant approximation of the time series using the unbalanced Haar wavelet transformation. The proposed method is shown to outperforms SAX in many real-world data applications.

시계열 데이터의 분류와 군집화를 효율적으로 수행하기 위해 다양한 시계열 표현 방법들이 제안되었다. 본 연구는 Lin 등 (2007)이 제안한 국소 평균 근사를 이용하여 시계열의 차원을 축소한 후 심볼릭 자료로 이산화하는 symbolic aggregate approximation (SAX) 방법의 개선에 대해서 연구하였다. SAX는 국소 평균 근사를 할 때 등간격으로 임의의 개수의 세그먼트로 나누어 평균을 계산하여 세그먼트의 개수에 그 성능이 크게 좌우된다. 따라서 본 논문은 불균형 Haar 웨이블릿 변환을 통해 국소 평균 수준을 등간격이 아니라 자료의 특성을 반영하여 자료 의존적으로 선택하게 함으로써 시계열의 차원을 효과적으로 축소함과 동시에 정보의 손실을 줄이는 방법에 대해서 제안한다. 제안한 방법은 실증 자료 분석을 통해 SAX 방법을 개선시킴을 확인하였다.

Keywords

GCGHDE_2018_v31n6_707_f0001.png 이미지

Figure 2.1. Example of symbolic aggregate approximation for a time series.

GCGHDE_2018_v31n6_707_f0002.png 이미지

Figure 3.1. Example of unbalanced Haar wavelet transformation for a time series.

GCGHDE_2018_v31n6_707_f0003.png 이미지

Figure 4.1. Comparision of 1-NN classification error rate on 28 datasets. 1-NN = 1-Nearest Neighbor classification.

GCGHDE_2018_v31n6_707_f0004.png 이미지

Figure 4.2. Comparision of compression ratio on 28 datasets.

GCGHDE_2018_v31n6_707_f0005.png 이미지

Figure 4.3. Hierarchical clustering of the control chart dataset. SAX = symbolic aggregate approximation.

GCGHDE_2018_v31n6_707_f0006.png 이미지

Figure 4.4. Normal and cyclic class converted by the piecewise aggregate approximation.

GCGHDE_2018_v31n6_707_f0007.png 이미지

Figure 4.5. Normal and cyclic class converted by the unbalanced Haar wavelet transformation.

Table 2.1. Notation for SAX

GCGHDE_2018_v31n6_707_t0001.png 이미지

Table 2.2. A lookup table that contains the breakpoints that divide a Gaussian distribution in an alphabet size(a) of equiprobable regions

GCGHDE_2018_v31n6_707_t0002.png 이미지

Table 2.3. A lookup table for the MINDIST function when a = 4

GCGHDE_2018_v31n6_707_t0003.png 이미지

Table 3.1. Notation for the proposed method

GCGHDE_2018_v31n6_707_t0004.png 이미지

Table 4.1. Comparision of 1-NN classification error rate on 28 datasets

GCGHDE_2018_v31n6_707_t0005.png 이미지

References

  1. Aghabozorgi, S., Shirkhorshidi, A. S., and Wah, T. H. (2015). Time-series clustering - a decade review, Information Systems, 53, 16-38. https://doi.org/10.1016/j.is.2015.04.007
  2. Baek, C. and Pipiras, V. (2009). Long range dependence, unbalanced Haar wavelet transformation and changes in local mean level, International Journal of Wavelets, Multiresolution and Information Processing, 7, 23-58. https://doi.org/10.1142/S0219691309002763
  3. Chan, K. and Fu, W. (1999). Efficient time series matching by wavelets, ICDE, 15, 126-133.
  4. Donoho, D. L. and Johnstone, I. M. (1994). Ideal spatial adaptation by wavelet shrinkage, Biometrika, 81, 425-455. https://doi.org/10.1093/biomet/81.3.425
  5. Faloutsos, C., Ranganathan, M., and Manolopoulos, Y. (1994). Fast subsequence matching in time-series databases, SIGMOD Record, 23, 419-429. https://doi.org/10.1145/191843.191925
  6. Fryzlewicz, P. (2007). Unbalanced Haar technique for nonparametric function estimation, The Journal of American Statistical Association, 102, 1310-1327.
  7. Keogh, E., Chakrabarti, K., Pazzani, M., and Mehrotra, S. (2001). Dimensionality reduction for fast similarity search in large time series databases, Journal of Knowledge and Information Systems, 3, 263-286. https://doi.org/10.1007/PL00011669
  8. Lin, J., Keogh, E., Wei, L., and Lonardi, S. (2007). Experiencing SAX: a novel symbolic representation of time series, DMKD, 15, 107-144.