DOI QR코드

DOI QR Code

Time Series Representation Combining PIPs Detection and Persist Discretization Techniques for Time Series Classification

시계열 분류를 위한 PIPs 탐지와 Persist 이산화 기법들을 결합한 시계열 표현

  • 박상호 (인하대학교 컴퓨터 정보공학부) ;
  • 이주홍 (인하대학교 컴퓨터 정보공학부)
  • Received : 2010.08.31
  • Accepted : 2010.09.16
  • Published : 2010.09.28

Abstract

Various time series representation methods have been suggested in order to process time series data efficiently and effectively. SAX is the representative time series representation method combining segmentation and discretization techniques, which has been successfully applied to the time series classification task. But SAX requires a large number of segments in order to represent the meaningful dynamic patterns of time series accurately, since it loss the dynamic property of time series in the course of smoothing the movement of time series. Therefore, this paper suggests a new time series representation method that combines PIPs detection and Persist discretization techniques. The suggested method represents the dynamic movement of high-diemensional time series in a lower dimensional space by detecting PIPs indicating the important inflection points of time series. And it determines the optimal discretizaton ranges by applying self-transition and marginal probabilities distributions to KL divergence measure. It minimizes the information loss in process of the dimensionality reduction. The suggested method enhances the performance of time series classification task by minimizing the information loss in the course of dimensionality reduction.

시계열 데이터를 효율적이고 효과적으로 처리하기 위해 다양한 시계열 표현 방법들이 제안되었다. SAX(Symbolic Aggregate approXimation)는 단편화와 이산화 기법들을 결합한 시계열 표현 방법으로, 시계열 분류 문제에 성공적으로 적용되었다. 그러나 SAX는 시계열의 움직임을 평활하여 시계열의 중요한 동적 패턴들을 정확히 표현하기 위해 세그먼트 수를 크게 해야 한다. 본 논문은 PIPs (Perceptually Important Points)탐지 기법과 Persist 이산화 방법을 결합한 시계열 표현 방법을 제안한다. 제안된 방법은 시계열의 중요한 변곡점들을 나타내는 PIP 들을 탐지하여 고차원 시계열의 동적 움직임을 저차원 공간에서 표현한다. 그리고 시계열의 자기 전이와 주변 확률 분포를 KL 다이버전스에 적용하여 최적의 이산화 영역들을 결정한다. 제안된 방법은 시계열의 차원 축소과정에서 정보 손실을 최소화하여 시계열 분류의 성능을 향상시킨다.

Keywords

References

  1. B-K Yi and C. Faloutsos, “Fast Time Sequence Indexing for Arbitrary Lp Norms”, Proceedings of the VLDB, Cairo, Egypt, 2000(9).
  2. C. L. Blake and C. J. Merz, UCI Repository of Machine Learning Databases, http://www.ics.uci.edu/~mlearn/MLRepository.html, UC Irvine, Dept. Information and Computer Science, 1998.
  3. E. Keogh, K. Chakrabarti and M. Pazzani, S. Mehrotra, "Dimensionality reduction for fast similarity search in large time series databases," Journal of Knowledge and Information Systems, Vol.3, No.3, pp.263-286, 2001. https://doi.org/10.1007/PL00011669
  4. E. Keogh, K. Chakrabarti, M. Pazzani, and S. Mehrotra, “Locally adaptive dimensionality reduction for indexing large time series databases," In proceedings of ACM SIGMOD Conference on Management of Data. Santa Barbara, CA, May 21-24, pp.151-162, 2001. https://doi.org/10.1145/376284.375680
  5. F. E. H. Tay and L. Cao, "Application of support vector machine in financial time series forecasting," Omega 29, pp.309-317, 2001. https://doi.org/10.1016/S0305-0483(01)00026-3
  6. J. Carlos, G. Alonso and J. R. Juan, "A graphical rule language for continuous dynamic systems," In Computational Intelligence for Modelling, Control and Automation. Masoud Mohammadian, Ed., Amsterdam, Netherlands, CIMCA-99, pp.482-487, IOS Press, 1999.
  7. J. Lin, E. Keogh, L. Wei, and S. Lonardi, "Experiencing SAX: A novel symbolic representation of time series," Data Mining and Knowledge Discovery, Vol.15, No.2, 2007. https://doi.org/10.1007/s10618-007-0064-z
  8. J. R. Quinlan, C4.5 : Programs for Machine Learning, Morgan Kaufmann Pub, LosAltos, Califoormia, 1993.
  9. K. Chan and W. Fu, "Efficient time series matching by wavelets," Proceedings of the 15th IEEE International Conference on Data Engineering, 1999.
  10. K. J. Kim, "Financial time series forecasting using support vector machines," Neurocomputing, Vol.55, pp.307-319, 2003. https://doi.org/10.1016/S0925-2312(03)00372-2
  11. M. Fabian and U. Alfred, “Optimizing Time Series Discretization for Knowledge Discovery,” ACM SIGKDD, pp.660-665, 2005. https://doi.org/10.1145/1081870.1081953
  12. M. Kubat, I. Koprinska, and G. Pfurtscheller, "Learning to classify biomedical signals", In Machine Learning and Data Mining, R.S. Michalski ,I.Bratko, M.Kubat, Eds., pp.409-428, John Wiley & Sons, 1998.
  13. R. Agrawal, C. Faloutsos, and A. Swami, "Efficient similarity search in sequence databases," Proceedings of the 4th Conference on Foundations of Data Organization and Algorithms, 1993. https://doi.org/10.1007/3-540-57301-1_5
  14. T. Fu, T-c. Fu, F. I. Chung, V Ng, and R. Luk, "Pattern Discovery from Stock Time Series Using Self-Organizing Maps," Notes KDD 2001 Workshop Temporal Data Mining, pp.27-37, 2001.
  15. U. M. Fayyad and K. B. Irani, "Multi-Interval Discretization of continuous-valued Attributes for Classification Learning," Proc. 13th Int'l Joint Conference of Artificial Intelligence, pp.1022-1027, 1993.