DOI QR코드

DOI QR Code

A New Importance Measure of Association Rules Using Information Theory

정보이론에 기반한 연관 규칙들의 새로운 중요도 측정 방법

  • 이창환 (동국대학교 정보통신학과) ;
  • 배주현 (동국대학교 정보통신학과)
  • Received : 2013.08.09
  • Accepted : 2013.11.05
  • Published : 2014.01.31

Abstract

The abstract should concisely state what was done, how it was done, principal results, and their significance. It should be less than 300 words for all forms of publication. The abstract should be written as one paragraph and should not contain tabular material or numbered references. At the end of abstract, keywords should be given in 3 to 5 words or phrases.

연관 규칙들을 이용한 분류학습은 최근 활발히 연구되는 분야의 하나이다. 이러한 연관 규칙을 이용한 분류에는 연관 규칙들에 대한 수치적 중요도를 계산하는 것이 중요하다. 본 논문에서는 정보 이론을 사용한 H measure 라는 새로운 규칙 중요도 기법을 제안한다. 구체적으로 Hellinger 변량을 이용하여 연관규칙의 중요도를 계산한다. 제안된 H measure 의 다양한 특성들을 분석하였으며 또한 이러한 H measure를 이용한 분류학습의 성능을 다른 규칙 measure를 이용한 분류학습의 성능과 비교하였다.

Keywords

References

  1. B. Liu, W. Hsu, S. Chen, and Y. Ma. Analyzing the subjective interestingness of association rules. IEEE Intelligent Systems, 15(5):47-55, 2000. https://doi.org/10.1109/5254.889106
  2. R. J. Bayardo and R. Agrawal. Mining the most interesting rules. In KDD, pages 145-154, 1999.
  3. X.-H. Huynh, F. Guillet, and H. Briand. Arqat: An exploratory analysis tool for interestingness measures. In the 11th international symposium on Applied Stochastic Models and Data Analysis, pages 334-344, 2005.
  4. P.-N. Tan, V. Kumar, and J. Srivastava. Selecting the right objective measure for association analysis. Information Systems, 29(4):293-313, 2004. https://doi.org/10.1016/S0306-4379(03)00072-3
  5. P. Lenca B. Vaillant and S. Lallich. A clustering of interestingness measures. In Proceedings of the 7th International Conference on Discovery Science, pages 290-297, 2004.
  6. P. Clark and T. Niblett. The CN2 induction algorithm. Machine Learning, 3(4):261-283, 1989.
  7. S. Jaroszewicz and D. A. Simovici. A general measure of rule interestingness. In PKDD, pages 253-265, 2001.
  8. P. Smyth and R. M. Goodman. An information theoretic approach to rule induction from databases. IEEE Transactions on Knowledge and Data Engineering, 4(4):301-316, 1992. https://doi.org/10.1109/69.149926
  9. S. Jaroszewicz and D. A. Simovici. Interestingness of frequent itemsets sing bayesian networks as background knowledge. KDD, pages 178-186, 2004.
  10. L. Wong G. Dong, X. Zhang and J. Li. Caep. Caep: Classification by aggregating emerging patterns. In Proceedings of the 2nd International Conference on Discovery Science, pages 30-42, 1999.
  11. J. Han W. Li and J. Pei. Cmar: Accurate and efficient classification based on multiple class-association rules. ICDM, pages 208-217, 2001. 14
  12. R. J. Beran. Minimum hellinger distances for parametric models. Ann. Statistics, 5:445-463, 1977. https://doi.org/10.1214/aos/1176343842
  13. G. Piatetsky-Shapiro. Discovery, analysis and presentation of strong rules, in: G. piatetsky-shapiro. In Knowledge Discovery in Databases, MIT Press., pages 229-248, 1991.
  14. A. Frank and A. Asuncion. UCI machine learning repository, 2010.
  15. U. M. Fayyad and K. B. Irani. Multi-interval discretization of continuous-valued attributes for classification learning. In Int'l Joint Conference on Articial Intelligence, pages 1022-1029, 1993.
  16. R. and R. Srikant, Fast algorithms for mining association rules in large databases. Proceedings of the 20th International Conference on Very Large Data Bases, VLDB, pages 487-499, Santiago, Chile, September 1994