Effect of Market Basket Size on the Accuracy of Association Rule Measures

장바구니 크기가 연관규칙 척도의 정확성에 미치는 영향

  • 김남규 (국민대학교 경상대학 비즈니스IT학부)
  • Published : 2008.06.30

Abstract

Recent interests in data mining result from the expansion of the amount of business data and the growing business needs for extracting valuable knowledge from the data and then utilizing it for decision making process. In particular, recent advances in association rule mining techniques enable us to acquire knowledge concerning sales patterns among individual items from the voluminous transactional data. Certainly, one of the major purposes of association rule mining is to utilize acquired knowledge in providing marketing strategies such as cross-selling, sales promotion, and shelf-space allocation. In spite of the potential applicability of association rule mining, unfortunately, it is not often the case that the marketing mix acquired from data mining leads to the realized profit. The main difficulty of mining-based profit realization can be found in the fact that tremendous numbers of patterns are discovered by the association rule mining. Due to the many patterns, data mining experts should perform additional mining of the results of initial mining in order to extract only actionable and profitable knowledge, which exhausts much time and costs. In the literature, a number of interestingness measures have been devised for estimating discovered patterns. Most of the measures can be directly calculated from what is known as a contingency table, which summarizes the sales frequencies of exclusive items or itemsets. A contingency table can provide brief insights into the relationship between two or more itemsets of concern. However, it is important to note that some useful information concerning sales transactions may be lost when a contingency table is constructed. For instance, information regarding the size of each market basket(i.e., the number of items in each transaction) cannot be described in a contingency table. It is natural that a larger basket has a tendency to consist of more sales patterns. Therefore, if two itemsets are sold together in a very large basket, it can be expected that the basket contains two or more patterns and that the two itemsets belong to mutually different patterns. Therefore, we should classify frequent itemset into two categories, inter-pattern co-occurrence and intra-pattern co-occurrence, and investigate the effect of the market basket size on the two categories. This notion implies that any interestingness measures for association rules should consider not only the total frequency of target itemsets but also the size of each basket. There have been many attempts on analyzing various interestingness measures in the literature. Most of them have conducted qualitative comparison among various measures. The studies proposed desirable properties of interestingness measures and then surveyed how many properties are obeyed by each measure. However, relatively few attentions have been made on evaluating how well the patterns discovered by each measure are regarded to be valuable in the real world. In this paper, attempts are made to propose two notions regarding association rule measures. First, a quantitative criterion for estimating accuracy of association rule measures is presented. According to this criterion, a measure can be considered to be accurate if it assigns high scores to meaningful patterns that actually exist and low scores to arbitrary patterns that co-occur by coincidence. Next, complementary measures are presented to improve the accuracy of traditional association rule measures. By adopting the factor of market basket size, the devised measures attempt to discriminate the co-occurrence of itemsets in a small basket from another co-occurrence in a large basket. Intensive computer simulations under various workloads were performed in order to analyze the accuracy of various interestingness measures including traditional measures and the proposed measures.

Keywords

References

  1. Agrawal, R., Imielinski, T., and Swami, A., 'Mining Association Rules between Sets of Items in Large Databases,' in Proc. ACM SIGMOD International Conference on Management of Data, Washington D.C., 1993, pp. 207-216
  2. Agrawal, R., Mehta, M., Shafer, J.C., Srikant, R., Arning, A., and Bollinger, T., 'The Quest Data Mining System,' in Proc. 2nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Portland, Oregon, 1996, pp. 244-249
  3. Agrawal, R. and Srikant, R., 'Fast Algorithms for Mining Association Rules,' in Proc. 20th International Conference on Very Large Data Bases, Santiago, Chile, 1994, pp. 487-499
  4. Barber, B. and Hamilton, H., 'Extracting Share Frequent Itemsets with Infrequent Subsets,' Data Mining and Knowledge Discovery, Vol. 7, 2003, pp. 153-185 https://doi.org/10.1023/A:1022419032620
  5. Brin, S., Motwani, R., and Silverstein, C., 'Beyond Market Baskets: Generalizing Association Rules to Correlations,' in Proc. ACM SIGMOD International Conference of Management of Data, Tucson, Arizona, 1997, pp. 265-276
  6. Cai, C.H., Fu, A.W.C., Cheng, C.H., and Kwong, W.W., 'Mining Association Rules with Weighted Items,' in Proc. 10th International Symposium on Database Engineering and Applications, Wales, U.K., 1998, pp. 68-77
  7. Carter, C.L., Hamilton, H.J., and Cercone, N., 'Shared Based Measures for Itemsets,' in Proc. 1st European Symposium on the Principles of Data Mining and Knowledge Discovery, Trondheim, Norway, 1997, pp. 14-24
  8. Chen, M.S., Han, J., and Yu, P.S., 'Data Mining: An Overview from a Database Perspective,' IEEE Transactions on Knowledge and Data Engineering, Vol. 8, 1996, pp. 866-883 https://doi.org/10.1109/69.553155
  9. Cooper, C., and Zito, M., 'Realistic Synthetic Data for Testing Association Rule Mining Algorithms for Market Basket Databases,' in Proc. 11th European Conference on Principles and Practice of Knowledge Discovery in Databases, Warsaw, Poland, 2007, pp. 398-405
  10. Geng, L. and Hamilton, H.J., 'Interestingness Measures for Data Mining: A Survey,' ACM Computing Surveys, Vol. 38, No. 3, 2006
  11. Han, J. and Kamber M., Data Mining: Concepts and Techniques, Morgan Kaufmann Publishers, California, 2007
  12. Hu, Y.H. and Chen, Y.K., 'Mining Association Rules with Multiple Minimum Supports: A New Mining Algorithm and a Support Tuning Mechanism,' Decision Support Systems, Vol. 42, 2006, pp. 1-24 https://doi.org/10.1016/j.dss.2004.09.007
  13. Lenca, P., Meyer, P., Vaillant, B., and Lallich, S., 'On Selecting Interestingness Measures for Association Rules: User Oriented Description and Multiple Criteria Decision Aid,' European Journal of Operational Research, Vol. 184, No. 2, 2008, pp. 610-626 https://doi.org/10.1016/j.ejor.2006.10.059
  14. Lenca, P., Vaillant, B., Meyer, P., and Lallich, S., 'Association Rule Interestingness Measures: Experimental and Theoretical Studies,' Quality Measures in Data Mining, Chap. 3, Springer, 2007, pp. 51-76
  15. Lin, W.Y. and Tseng, M.C., 'Automated Support Specification for Efficient Mining of Interesting Association Rules,' Journal of Information Science, Vol. 32, No. 3, 2006, pp. 238-250 https://doi.org/10.1177/0165551506064364
  16. Liu, B., Hsu, W., and Ma, Y., 'Mining Association Rules with Multiple Minimum Supports,' in Proc. 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Diego, California, 1999, pp. 337-341
  17. Olson, D. and Shi, Y., Introduction to Business Data Mining, McGraw-Hill, New York, 2007
  18. Tan, P.N., Kumar, V., and Srivastava, J., 'Selecting the Right Interestingness Measure for Association Patterns,' in Proc. 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Alberta, Canada, 2002, pp. 32-41
  19. Tao, F., Murtagh, F., and Farid, M., 'Weighted Association Rule Mining using Weighted Support and Significance Framework,' in Proc. 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington D.C., 2003, pp. 661-666
  20. Vaillant, B., Lenca, P., and Lallich, S., 'A Clustering of Interestingness Measures,' in Proc, 7th International Conference on Discovery Science, Padova, Italy, 2004, pp. 290-297
  21. Wang, K., He, Y., and Han, J., 'Pushing Support Constraints into Association Rule Mining,' IEEE Transactions on Knowledge and Data Engineering, Vol. 15, No. 3, 2003, pp. 642-657 https://doi.org/10.1109/TKDE.2003.1198396