DOI QR코드

DOI QR Code

Proposition of causal association rule thresholds

인과적 연관성 규칙 평가 기준의 제안

  • Received : 2013.07.15
  • Accepted : 2013.08.18
  • Published : 2013.11.30

Abstract

Data mining is the process of analyzing a huge database from different perspectives and summarizing it into useful information. One of the well-studied problems in data mining is association rule generation. Association rule mining finds the relationship among several items in massive volume database using the interestingness measures such as support, confidence, lift, etc. Typical applications for this technique include retail market basket analysis, item recommendation systems, cross-selling, customer relationship management, etc. But these interestingness measures cannot be used to establish a causality relationship between antecedent and consequent item sets. This paper propose causal association thresholds to compensate for this problem, and then check the three conditions of interestingness measures. The comparative studies with basic and causal association thresholds are shown by numerical example. The results show that causal association thresholds are better than basic association thresholds.

연관성 규칙 마이닝은 지지도, 신뢰도, 향상도 등의 흥미도 측도를 기반으로 하여 대용량 데이터베이스를 구성하고 있는 항목들 간의 관련성을 찾아내는 기법이다. 이 기법은 기업의 의사결정 문제, 유통업에서의 교차판매, 고객관리 등 현업에서 많이 활용되고는 있으나, 이러한 기본적인 연관성 평가기준만으로는 두 항목 간의 인과관계를 설명할 수 없다. 본 논문에서는 이러한 문제를 해결하기 위해 인과적 연관성 규칙을 제안하는 동시에, 고려하는 평가 기준들이 흥미도 측도의 조건을 충족하는지의 여부를 점검하였다. 본 논문에서 제안한 인과적 향상도는 세 가지 조건 모두를 만족하는 것으로 입증되었다. 인과적 지지도와 인과적 신뢰도는 동시 발생 확률의 값에 따라 단조 증가하는 조건과 각 항목의 주변 확률의 값에 따라 단조 감소하는 조건은 만족하였다. 반면에 두 항목이 독립이면 연관성 평가기준의 값이 1이 되는 조건에 대해서는 기존의 지지도와 신뢰도와 같이 이 조건이 충족되지 않았다. 또한 예제를 통해 기존의 연관성 평가 기준과 인과적 연관성 평가 기준을 비교해 본 결과, 기존의 평가측도인 지지도와 신뢰도를 기준으로 연관성 규칙 생성 여부를 판단했을 때 탈락되는 규칙도 인과적 평가 기준인 인과적 지지도와 인과적 신뢰도를 이용하여 판단하게 되면 연관성 규칙으로 채택할 수 있다는 사실을 발견하였다.

Keywords

References

  1. Agrawal, R., Imielinski, R. and Swami, A. (1993). Mining association rules between sets of items in large databases. Proceedings of the ACM SIGMOD Conference on Management of Data, 207-216.
  2. Agrawal, R. and Srikant, R. (1994). Fast algorithms for mining association rules. Proceedings of the 20th VLDB Conference, 487-499.
  3. Berzal, F., Cubero, J., Marin, N., Sanchez, D., Serrano, J. and Vila, A. (2005). Association rule evaluation for classification purposes. Actas del III Taller Nacional de Miner´ıa de Datos y Aprendizaje, TAMIDA2005, 135-144.
  4. Cho, K. H. and Park, H. C. (2011a). Study on the multi intervening relation in association rules. Journal of the Korean Data Analysis Society, 13, 297-306.
  5. Cho, K. H. and Park, H. C. (2011b). A study on insignificant rules discovery in association rule mining. Journal of the Korean Data & Information Science Society, 22, 81-88.
  6. Kodratoff, Y. (2000). Comparing machine learning and knowledge discovery in databases: An application to knowledge discovery in texts. Proceeding of Machine Learning and its Applications: Advanced Lectures, 1-21.
  7. Park, H. C. (2011). Association rule ranking function by decreased lift influence. Journal of the Korean Data & Information Science Society, 22, 179-188.
  8. Park, H. C. (2012a). Negatively attributable and pure confidence for generation of negative association rules. Journal of the Korean Data & Information Science Society, 23, 707-716.
  9. Park, H. C. (2012b). Exploration of PIM based similarity measures as association rule thresholds. Journal of the Korean Data & Information Science Society, 23, 1127-1135. https://doi.org/10.7465/jkdi.2012.23.6.1127
  10. Park, J. S., Chen, M. S. and Philip, S. Y. (1995). An effective hash-based algorithms for mining association rules. Proceedings of ACM SIGMOD Conference on Management of Data, 104-123.
  11. Piatetsky-Shapiro, G. (1991). Discovery, analysis and presentation of strong rules. Knowledge Discovery in Databases, AAAI/MIT Press, 229-248.
  12. Saygin, Y., Vassilios, S. V. and Clifton, C. (2002). Using unknowns to prevent discovery of association rules. Proceedings of 2002 Conference on Research Issues in Data Engineering, 45-54.
  13. Sergey, B., Rajeev M., Jeffrey D.U. and Shalom T. (1997). Dynamic itemset counting and implication rules for market data. Proceedings of ACM SIGMOD Conference on Management of Data, 255-264.

Cited by

  1. Signed Hellinger measure for directional association vol.27, pp.2, 2016, https://doi.org/10.7465/jkdi.2016.27.2.353
  2. The development of symmetrically and attributably pure confidence in association rule mining vol.25, pp.3, 2014, https://doi.org/10.7465/jkdi.2014.25.3.601
  3. Comparison of confidence measures useful for classification model building vol.25, pp.2, 2014, https://doi.org/10.7465/jkdi.2014.25.2.365
  4. A study on the ordering of similarity measures with negative matches vol.26, pp.1, 2015, https://doi.org/10.7465/jkdi.2015.26.1.89
  5. Future Signals of Health and Welfare Policies and Issues using Social Big Data vol.41, pp.4, 2016, https://doi.org/10.21032/jhis.2016.41.4.417
  6. Predicting tobacco risk factors by using social big data vol.26, pp.5, 2015, https://doi.org/10.7465/jkdi.2015.26.5.1047
  7. Social Big Data Analysis of Information Spread and Perceived Infection Risk During the 2015 Middle East Respiratory Syndrome Outbreak in South Korea vol.20, pp.1, 2017, https://doi.org/10.1089/cyber.2016.0126
  8. Proposition of causally confirmed measures in association rule mining vol.25, pp.4, 2014, https://doi.org/10.7465/jkdi.2014.25.4.857
  9. Development of association rule threshold by balancing of relative rule accuracy vol.25, pp.6, 2014, https://doi.org/10.7465/jkdi.2014.25.6.1345