DOI QR코드

DOI QR Code

시계열자료의 효율적 군집분석을 위한 구간특징화와 계층적 베이지안 기법의 융합

A Fusion of the Period Characterized and Hierarchical Bayesian Techniques for Efficient Cluster Analysis of Time Series Data

  • 정영애 (선문대학교 IT학부) ;
  • 전진호 (가톨릭관동대학교 경영학과)
  • Jung, Young-Ae (Dept. of Information Technology Education, Sun-Moon University) ;
  • Jeon, Jin-Ho (Dept. of Business Administration, Catholic Kwan-Dong University)
  • 투고 : 2015.04.16
  • 심사 : 2015.07.20
  • 발행 : 2015.07.28

초록

주가지표처럼 동적이며 시간흐름을 따르는 시계열자료들을 이해하는 효과적인 방법은 주어진 시계열자료들에 대하여 모델을 결정함으로서 이해하는 것이 좋다. 주어진 자료들에 대한 모델 결정과정은 수집되어진 대용량 시계열자료 전체를 한 번에 다 살펴보는 것보다 자료를 특정의 중요한 몇 개의 하위그룹으로 군집화하여 각 군집별 모델결정을 통해 자료 전체를 이해하는 것이 효율적이다. 본 연구에서는 주어진 시계열자료들에 대하여 하위그룹으로의 효율적 군집화 과정 그리고 각 군집별 모델결정의 두 과정 중 첫 번째 과정인 하위집단으로 군집화 과정에 자료의 구간특징화 기법과 휴리스틱 베이지안기법의 융합을 이용하여 시간 및 계산비용을 감소시킬 수 있는 기법을 제안하였으며 실제적인 주가지표를 이용한 실험을 통해 제안하는 기법의 유효성을 확인하였다.

An effective way to understand the dynamic and time series that follows the passage of time, as valuation is to establish a model to analyze the phenomena of the system. Model of the decision process is efficient clustering information of the total mass of the time series data of the relevant population been collected in a particular number of sub-groups than to look at all a time to an understand of the overall data through each community-specific model determination. In this study, a sub-grouping of the group and the first of the two process model of each cluster by determining, in the following in sub-population characterized by a fusion with heuristic Bayesian clustering techniques proposed a process which can reduce calculation time and cost was confirmed by experiments using actual effectiveness valuation.

키워드

참고문헌

  1. A. K. Jain and D. C. Dube, Algorithms for Clustering Data, Prentice Hall, 1988.
  2. T. Okuda, E, Tanara and T. Kasai, "A Method for the Correction of Garbled Words based on the Levenshtein Metric", IEEE Transaction on Computers C25, 2, pp.172-177, 1976(2).
  3. T. Oates, "Identifying Distinctive Subsequence in Multivariate Time Series by Clustering", Proceedings of the Sixteenth International Conference on Machine Learning, 1999.
  4. Y. Huhtala, J. Karkkinen, H. Toivonen and N. R, "Mining for Similarity in aligned Time Series using Wavlets", Proceedings of SPIE on Data Mining and Knowledge Discover: Theory, Tools and Technology, 1999.
  5. S. ManGanaris, "Learning to Classify Sensor Data", IJCAI'95 Workshop on Machine Learning in Engineering, 1995.
  6. Y. Cho and G. Lee., "Modeling and Prediction of Time Seires Data based on Markov Model", The Korea Society of Computer and Information", Vol. 16, no. 2, pp. 225-233, 2011.
  7. L. Rabiner., "A Tutorial on Hidden Markov Models and selected applications in speech recognition," Proc. of IEEE77, pp.257-286, 1989.
  8. M. Siddiqi, J. Gordon and W. Moore., "Fast State Discovery for HMM Moel Selection and Learning," In Proc. Int'l Conference on Artificial Intelligence and Statistics, 2007.
  9. J. Jeon., "A Study on Determining Prediction Models using Model-based Clustering of Time Series Data", Dankook Univ Ph. D, 2007.
  10. J. Jeon and m. Kim.,"A Study of Economic Indicator Prediction Model using Dimensions Decrease Techniques and HMM",The Journal of Digital Policy & Management, Vol. 11, no 10, pp305-311. 2013.
  11. Y. Cho and G. Lee., "Prediction on Clusters by using Information Crtterion and Multiple Seeds", The Institute of Webcasting, Internet and Telecommunication, Vol. 10, no. 6, pp 145-152, 2010.
  12. Y. Byungki and F. Christos., "Fast Time Sequence Indexing for Arbitrary Lp norms", In The VLDB Journal, pp 385-394, 2000.
  13. L. Jessica, K. Eamonn, L. Stefano and C. Bill., "A Symbolic Representation of Time Series, with Impliction for Streaming Algorithms", 8th ACM SIGMOD Workshop on Research Issues in DMKD, 2003.
  14. J. Jeon and m. Kim.,"A Study of Criterion for Efficient Clustering Estimation of Temporal Data", The Institute of Webcasting, Internet and Telecommunication, Vol. 11, no. 5, pp 139-144, 2011.
  15. J. Jeon and m. Kim.,"A Study on Prediction the Movement Pattern of Time Series Data using Information Criterion and Effective Data Length", The Institute of Webcasting, Internet and Telecommunication, Vol. 13, no. 1, pp 101-107, 2013.