DOI QR코드

DOI QR Code

Learning Multidimensional Sequential Patterns Using Hellinger Entropy Function

Hellinger 엔트로피를 이용한 다차원 연속패턴의 생성방법

  • Published : 2004.08.01

Abstract

The technique of sequential pattern mining means generating a set of inter-transaction patterns residing in time-dependent data. This paper proposes a new method for generating sequential patterns with the use of Hellinger measure. While the current methods are generating single dimensional sequential patterns within a single attribute, the proposed method is able to detect multi-dimensional patterns among different attributes. A number of heuristics, based on the characteristics of Hellinger measure, are proposed to reduce the computational complexity of the sequential pattern systems. Some experimental results are presented.

데이터 마이닝에서 연속패턴(sequential pattern) 생성기술은 시차를 두고 발생한 사건들에 대하여 잠재해있는 패턴을 발견하는 기술을 의미한다. 본 연구는 정보이론을 이용하여 데이터베이스로부터 연속패턴을 자동으로 발견하는 방법에 관한 내용이다. 기존의 방법들이 한 속성내에서의 연속패턴만을 탐지하는 일차원 연속패턴을 생성하는데 비하여 본 연구에서 제시하는 방법은 데이터베이스내의 모든 속성간의 연속패턴 관계를 탐지할 수 있는 다차원 연속패턴을 생성할 수 있다. 본 연구에서는 연속패턴 생성을 위하여 헬링거(Hellinger) 변량을 사용하였으며 이를 이용하여 발견된 연속패턴들의 중요도를 측정할 수 있었다. 또한 헬링거 변량의 함수적인 특성을 분석하여 연속패턴 추출의 복잡도를 줄이기 위한 두 가지의 법칙이 제안되었고 다수의 실험 데이터를 통하여 다차원의 연속패턴을 생성할 수 있음을 보였다.

Keywords

References

  1. Jiawei Han, Micheline Kamber, Data Mining : Concepts and Techniques, Morgan Kaufmann, August, 2000
  2. David J. Hand, Heikki Mannila and Padhraic Smyth, Principles of Data Mining, MIT Press, Fall, 2000
  3. R. Agrawal and R. Srikant, Mining sequential pattern, Conf. Data Engineering(ICDE '95)
  4. R. Agrawal and R. Srikant, Mining sequential pattern : Generalizations and Perfoemance Improvements, Int'l Conf. on Extending Database Technology, 1996
  5. R. Agrawal, R. Srikant, 'Fast Algorithms for Mining Association Rules,' Proc. of the 20th Int'l Conference on Very Large Databases, Santiago, Chile, Sept., 1994
  6. Rakesh Agrawal, Tomasz Imielinski and Arun Swami, Mining association rules between sets of items in large databases, In Proc. of the ACM SIGMOD Conference on Management of Data, Washington, D.C., pp.207-216, May, 1993 https://doi.org/10.1145/170036.170072
  7. C. Lee, Learning Inductive Rules Using Hellinger Measure, Applied Artificial Intelligence, Vol.13, No.8, pp.743-762, 1999 https://doi.org/10.1080/088395199117207
  8. R. J. Beran, Minimum Hellinger Distances for Parametric Models, Ann. Statistics, Vol.5, pp.445-463, 1977 https://doi.org/10.1214/aos/1176343842
  9. J. Han, J. Pei, B. Mortazavi-Asl, Q.Chen, U. Dayal and M.-C. Hsu., Freespan : Frequent pattern-projected sequential pattern mining, Conf. Knowledge Discovery and Data Mining(KDD'00), 2000 https://doi.org/10.1145/347090.347167
  10. H. Mannila, H. Toivonen and A. I. Verkamo, Discovery of frequent episodes in event sequences, Data Mining and Knowledge Discovery, 1998 https://doi.org/10.1023/A:1009748302351
  11. M. N. Garafalakis, R. Rastogi, K. Shim, SPIRIT : Sequential Pattern Mining with Regular Expression Constraints Int'l COnf. on VLDB, 1999
  12. J. Han, J. Pei, G. Dong and K. Wang, Efficient Computation of Iceberg Cubes with Complex Measures, Int'l Conf. on Management of Data(SIGMOD-01), 2001 https://doi.org/10.1145/376284.375664
  13. F. Masseglia, F. Cathala and P. Poncelet, Incremental Mining of Sequential Patterns in Large Databases, European Symposium on Principles of Data Mining and Knowledge Discovery(PKDD98), Vol.1510, pp.176-184, 1998 https://doi.org/10.1007/BFb0094818
  14. M. Zaki, N. Lesh and M. Ogihara. PLANMINE : Sequence Mining for Plan Failures, Int'l Conf. on Knowledge Discovery and Data Mining(KDD-98), 1998
  15. M. Zaki, SPADE : An Efficient Algorithm for Mining Frequent Sequences, Machine Learning, Vol.42, No.1/2, pp.31-60, 2001 https://doi.org/10.1023/A:1007652502315