Effect of Market Basket Size on the Accuracy of Association Rule Measures

Kim, Nam-Gyu;

Asia pacific journal of information systems

Volume 18 Issue 2
/
Pages.95-114
/
2008
/
2288-5404(pISSN)
/
2288-6818(eISSN)

The Korea Society of Management Information Systems (한국경영정보학회)

Effect of Market Basket Size on the Accuracy of Association Rule Measures

장바구니 크기가 연관규칙 척도의 정확성에 미치는 영향

Kim, Nam-Gyu

김남규 (국민대학교 경상대학 비즈니스IT학부)

Published : 2008.06.30

PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

Recent interests in data mining result from the expansion of the amount of business data and the growing business needs for extracting valuable knowledge from the data and then utilizing it for decision making process. In particular, recent advances in association rule mining techniques enable us to acquire knowledge concerning sales patterns among individual items from the voluminous transactional data. Certainly, one of the major purposes of association rule mining is to utilize acquired knowledge in providing marketing strategies such as cross-selling, sales promotion, and shelf-space allocation. In spite of the potential applicability of association rule mining, unfortunately, it is not often the case that the marketing mix acquired from data mining leads to the realized profit. The main difficulty of mining-based profit realization can be found in the fact that tremendous numbers of patterns are discovered by the association rule mining. Due to the many patterns, data mining experts should perform additional mining of the results of initial mining in order to extract only actionable and profitable knowledge, which exhausts much time and costs. In the literature, a number of interestingness measures have been devised for estimating discovered patterns. Most of the measures can be directly calculated from what is known as a contingency table, which summarizes the sales frequencies of exclusive items or itemsets. A contingency table can provide brief insights into the relationship between two or more itemsets of concern. However, it is important to note that some useful information concerning sales transactions may be lost when a contingency table is constructed. For instance, information regarding the size of each market basket(i.e., the number of items in each transaction) cannot be described in a contingency table. It is natural that a larger basket has a tendency to consist of more sales patterns. Therefore, if two itemsets are sold together in a very large basket, it can be expected that the basket contains two or more patterns and that the two itemsets belong to mutually different patterns. Therefore, we should classify frequent itemset into two categories, inter-pattern co-occurrence and intra-pattern co-occurrence, and investigate the effect of the market basket size on the two categories. This notion implies that any interestingness measures for association rules should consider not only the total frequency of target itemsets but also the size of each basket. There have been many attempts on analyzing various interestingness measures in the literature. Most of them have conducted qualitative comparison among various measures. The studies proposed desirable properties of interestingness measures and then surveyed how many properties are obeyed by each measure. However, relatively few attentions have been made on evaluating how well the patterns discovered by each measure are regarded to be valuable in the real world. In this paper, attempts are made to propose two notions regarding association rule measures. First, a quantitative criterion for estimating accuracy of association rule measures is presented. According to this criterion, a measure can be considered to be accurate if it assigns high scores to meaningful patterns that actually exist and low scores to arbitrary patterns that co-occur by coincidence. Next, complementary measures are presented to improve the accuracy of traditional association rule measures. By adopting the factor of market basket size, the devised measures attempt to discriminate the co-occurrence of itemsets in a small basket from another co-occurrence in a large basket. Intensive computer simulations under various workloads were performed in order to analyze the accuracy of various interestingness measures including traditional measures and the proposed measures.

Keywords

References

Agrawal, R., Imielinski, T., and Swami, A., 'Mining Association Rules between Sets of Items in Large Databases,' in Proc. ACM SIGMOD International Conference on Management of Data, Washington D.C., 1993, pp. 207-216
Agrawal, R., Mehta, M., Shafer, J.C., Srikant, R., Arning, A., and Bollinger, T., 'The Quest Data Mining System,' in Proc. 2nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Portland, Oregon, 1996, pp. 244-249
Agrawal, R. and Srikant, R., 'Fast Algorithms for Mining Association Rules,' in Proc. 20th International Conference on Very Large Data Bases, Santiago, Chile, 1994, pp. 487-499
Barber, B. and Hamilton, H., 'Extracting Share Frequent Itemsets with Infrequent Subsets,' Data Mining and Knowledge Discovery, Vol. 7, 2003, pp. 153-185 https://doi.org/10.1023/A:1022419032620
Brin, S., Motwani, R., and Silverstein, C., 'Beyond Market Baskets: Generalizing Association Rules to Correlations,' in Proc. ACM SIGMOD International Conference of Management of Data, Tucson, Arizona, 1997, pp. 265-276
Cai, C.H., Fu, A.W.C., Cheng, C.H., and Kwong, W.W., 'Mining Association Rules with Weighted Items,' in Proc. 10th International Symposium on Database Engineering and Applications, Wales, U.K., 1998, pp. 68-77
Carter, C.L., Hamilton, H.J., and Cercone, N., 'Shared Based Measures for Itemsets,' in Proc. 1st European Symposium on the Principles of Data Mining and Knowledge Discovery, Trondheim, Norway, 1997, pp. 14-24
Chen, M.S., Han, J., and Yu, P.S., 'Data Mining: An Overview from a Database Perspective,' IEEE Transactions on Knowledge and Data Engineering, Vol. 8, 1996, pp. 866-883 https://doi.org/10.1109/69.553155
Cooper, C., and Zito, M., 'Realistic Synthetic Data for Testing Association Rule Mining Algorithms for Market Basket Databases,' in Proc. 11th European Conference on Principles and Practice of Knowledge Discovery in Databases, Warsaw, Poland, 2007, pp. 398-405
Geng, L. and Hamilton, H.J., 'Interestingness Measures for Data Mining: A Survey,' ACM Computing Surveys, Vol. 38, No. 3, 2006
Han, J. and Kamber M., Data Mining: Concepts and Techniques, Morgan Kaufmann Publishers, California, 2007
Hu, Y.H. and Chen, Y.K., 'Mining Association Rules with Multiple Minimum Supports: A New Mining Algorithm and a Support Tuning Mechanism,' Decision Support Systems, Vol. 42, 2006, pp. 1-24 https://doi.org/10.1016/j.dss.2004.09.007
Lenca, P., Meyer, P., Vaillant, B., and Lallich, S., 'On Selecting Interestingness Measures for Association Rules: User Oriented Description and Multiple Criteria Decision Aid,' European Journal of Operational Research, Vol. 184, No. 2, 2008, pp. 610-626 https://doi.org/10.1016/j.ejor.2006.10.059
Lenca, P., Vaillant, B., Meyer, P., and Lallich, S., 'Association Rule Interestingness Measures: Experimental and Theoretical Studies,' Quality Measures in Data Mining, Chap. 3, Springer, 2007, pp. 51-76
Lin, W.Y. and Tseng, M.C., 'Automated Support Specification for Efficient Mining of Interesting Association Rules,' Journal of Information Science, Vol. 32, No. 3, 2006, pp. 238-250 https://doi.org/10.1177/0165551506064364
Liu, B., Hsu, W., and Ma, Y., 'Mining Association Rules with Multiple Minimum Supports,' in Proc. 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Diego, California, 1999, pp. 337-341
Olson, D. and Shi, Y., Introduction to Business Data Mining, McGraw-Hill, New York, 2007
Tan, P.N., Kumar, V., and Srivastava, J., 'Selecting the Right Interestingness Measure for Association Patterns,' in Proc. 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Alberta, Canada, 2002, pp. 32-41
Tao, F., Murtagh, F., and Farid, M., 'Weighted Association Rule Mining using Weighted Support and Significance Framework,' in Proc. 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington D.C., 2003, pp. 661-666
Vaillant, B., Lenca, P., and Lallich, S., 'A Clustering of Interestingness Measures,' in Proc, 7th International Conference on Discovery Science, Padova, Italy, 2004, pp. 290-297
Wang, K., He, Y., and Han, J., 'Pushing Support Constraints into Association Rule Mining,' IEEE Transactions on Knowledge and Data Engineering, Vol. 15, No. 3, 2003, pp. 642-657 https://doi.org/10.1109/TKDE.2003.1198396