DOI QR코드

DOI QR Code

Signed Hellinger measure for directional association

연관성 방향을 고려한 부호 헬링거 측도의 제안

  • Received : 2016.02.16
  • Accepted : 2016.03.21
  • Published : 2016.03.31

Abstract

By Wikipedia, data mining is the process of discovering patterns in a big data set involving methods at the intersection of association rule, decision tree, clustering, artificial intelligence, machine learning. and database systems. Association rule is a method for discovering interesting relations between items in large transactions by interestingness measures. Association rule interestingness measures play a major role within a knowledge discovery process in databases, and have been developed by many researchers. Among them, the Hellinger measure is a good association threshold considering the information content and the generality of a rule. But it has the drawback that it can not determine the direction of the association. In this paper we proposed a signed Hellinger measure to be able to interpret operationally, and we checked three conditions of association threshold. Furthermore, we investigated some aspects through a few examples. The results showed that the signed Hellinger measure was better than the Hellinger measure because the signed one was able to estimate the right direction of association.

데이터 마이닝은 빅 데이터에 내재되어 있는 새로운 법칙이나 잠재되어 있는 지식을 탐색한 후, 이를 근거로 하여 의사결정에 활용하고자 하는 것이다. 위키 백과사전에 의하면 데이터 마이닝 기법 중의 하나인 연관성 규칙은 연관성 평가 기준에 의해 관심 있는 항목들 간에 관련성을 찾아내는 기법으로 많은 연구자들에 의해 연관성 평가를 위한 흥미도 측도들이 개발되어 왔다. 이들 중에서 헬링거 측도는 여러 가지 흥미도 측도들에 비해 많은 장점이 있으나 연관성의 방향을 판단하기가 곤란한 문제를 내포하고 있다. 이 문제를 해결하기 위해 본 논문에서는 부호를 가지는 헬링거 측도를 제안하고 몇 가지 예제를 통하여 유용성을 고찰하였다. 그 결과, 본 논문에서 제안하는 부호 헬링거 측도는 양의 연관성을 가지는 경우에는 양의 값으로 나타나고 음의 연관성을 가지는 경우에는 음의 값을 갖는 것으로 나타났다. 또한 동시발생빈도, 동시 비 발생빈도, 그리고 불일치 빈도가 증가함에 따라 기본적인 연관성 평가 기준들과 부호 헬링거 측도는 증감 여부가 동일한 것을 알 수 있었다.

Keywords

References

  1. Agrawal, R., Imielinski, R. and Swami, A. (1993). Mining association rules between sets of items in large databases. Proceedings of the ACM SIGMOD Conference on Management of Data, Association for Computing Machinery, New York, USA, 207-216.
  2. Ahn, K. and Kim, S. (2003). A new interstingness measure in association rules mining. Journal of the Korean Institute of Industrial Engineers, 29, 41-48.
  3. Beran, R. J. (1977). Minimum hellinger distances for parametric models. Annals of Statistics, 5, 445-463. https://doi.org/10.1214/aos/1176343842
  4. Cho, K. H. and Park, H. C. (2013). A study of Gyungnam's social indicator survey using data mining. Journal of the Korean Data Analysis Society, 15, 2489-2497.
  5. Lee, C. H. and Bae, J. H. (2014). A new importance measure of association rules using information theory. Journal of rhe Korea Information Processing Society Transactions on Software and Data Engineering, 3, 37-42.
  6. Jin, D. S., Kang, C., Kim, K. K. and Choi, S. B. (2011). CRM on travel agency using association rules. Journal of the Korean Data Analysis Society, 13, 2945-2952.
  7. Park, H. C. (2011a). Association rule ranking function by decreased lift influence. Journal of the Korean Data & Information Science Society, 22, 179-188.
  8. Park, H. C. (2011b). The proposition of attributably pure confidence in association rule mining. Journal of the Korean Data & Information Science Society, 22, 235-243.
  9. Park, H. C. (2012a). Negatively attributable and pure confidence for generation of negative association rules. Journal of the Korean Data & Information Science Society, 23, 707-716.
  10. Park, H. C. (2012b). Exploration of PIM based similarity measures as association rule thresholds. Journal of the Korean Data & Information Science Society, 23, 1127-1135. https://doi.org/10.7465/jkdi.2012.23.6.1127
  11. Park, H. C. (2013a). The proposition of compared and a ttributably pure confidence in association rule mining. Journal of the Korean Data & Information Science Society, 24, 523-532. https://doi.org/10.7465/jkdi.2013.24.3.523
  12. Park, H. C. (2013b). Proposition of causal association rule thresholds. Journal of the Korean Data & Information Science Society, 24, 1189-1197. https://doi.org/10.7465/jkdi.2013.24.6.1189
  13. Park, H. C. (2014a). Comparison of cosine family similarity measures in the aspect of association rule. Journal of the Korean Data Analysis Society, 16, 729-737.
  14. Park, H. C. (2014b). Comparison of confidence measures useful for classification model building. Journal of the Korean Data & Information Science Society, 25, 1-7. https://doi.org/10.7465/jkdi.2014.25.1.1
  15. Park, H. C. (2015). A study on the ordering of PIM family similarity measures without marginal probability. Journal of the Korean Data & Information Science Society, 26, 367-376. https://doi.org/10.7465/jkdi.2015.26.2.367
  16. Park, J. H. and Pi, S. Y. (2015). A study on wt-algorithm for effective reduction of association rules. Journal of the Korea Industrial Information Systems Research, 20, 61-69.
  17. Piatetsky-Shapiro, G. (1991). Discovery, analysis and presentation of strong rules, Knowledge Discovery in Databases, AAAI/MIT Press, Cambridge MA, USA, 229-248.
  18. Silberschatz, A. and Tuzhilin, A. (1996). What makes patterns interesting in knowledge discovery systems. IEEE Transactions on Knowledge Data Engineering, 8, 970-974. https://doi.org/10.1109/69.553165
  19. Tan, P. N., Kumar, V. and Srivastava, J. (2002). Selecting the right interestingness measure for association patterns. Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Association for Computing Machinery, New York, USA, 32-41.

Cited by

  1. 흥미도 측도 관점에서 상대적 인과 강도의 고찰 vol.28, pp.1, 2016, https://doi.org/10.7465/jkdi.2017.28.1.49