Browse > Article
http://dx.doi.org/10.13088/jiis.2012.18.1.023

An Investigation on Expanding Co-occurrence Criteria in Association Rule Mining  

Kim, Mi-Sung (Graduate School of BIT, Kookmin University)
Kim, Nam-Gyu (School of MIS, Kookmin University)
Ahn, Jae-Hyeon (Graduate School of Information and Media Management, KAIST)
Publication Information
Journal of Intelligence and Information Systems / v.18, no.1, 2012 , pp. 23-38 More about this Journal
Abstract
There is a large difference between purchasing patterns in an online shopping mall and in an offline market. This difference may be caused mainly by the difference in accessibility of online and offline markets. It means that an interval between the initial purchasing decision and its realization appears to be relatively short in an online shopping mall, because a customer can make an order immediately. Because of the short interval between a purchasing decision and its realization, an online shopping mall transaction usually contains fewer items than that of an offline market. In an offline market, customers usually keep some items in mind and buy them all at once a few days after deciding to buy them, instead of buying each item individually and immediately. On the contrary, more than 70% of online shopping mall transactions contain only one item. This statistic implies that traditional data mining techniques cannot be directly applied to online market analysis, because hardly any association rules can survive with an acceptable level of Support because of too many Null Transactions. Most market basket analyses on online shopping mall transactions, therefore, have been performed by expanding the co-occurrence criteria of traditional association rule mining. While the traditional co-occurrence criteria defines items purchased in one transaction as concurrently purchased items, the expanded co-occurrence criteria regards items purchased by a customer during some predefined period (e.g., a day) as concurrently purchased items. In studies using expanded co-occurrence criteria, however, the criteria has been defined arbitrarily by researchers without any theoretical grounds or agreement. The lack of clear grounds of adopting a certain co-occurrence criteria degrades the reliability of the analytical results. Moreover, it is hard to derive new meaningful findings by combining the outcomes of previous individual studies. In this paper, we attempt to compare expanded co-occurrence criteria and propose a guideline for selecting an appropriate one. First of all, we compare the accuracy of association rules discovered according to various co-occurrence criteria. By doing this experiment we expect that we can provide a guideline for selecting appropriate co-occurrence criteria that corresponds to the purpose of the analysis. Additionally, we will perform similar experiments with several groups of customers that are segmented by each customer's average duration between orders. By this experiment, we attempt to discover the relationship between the optimal co-occurrence criteria and the customer's average duration between orders. Finally, by a series of experiments, we expect that we can provide basic guidelines for developing customized recommendation systems. Our experiments use a real dataset acquired from one of the largest internet shopping malls in Korea. We use 66,278 transactions of 3,847 customers conducted during the last two years. Overall results show that the accuracy of association rules of frequent shoppers (whose average duration between orders is relatively short) is higher than that of causal shoppers. In addition we discover that with frequent shoppers, the accuracy of association rules appears very high when the co-occurrence criteria of the training set corresponds to the validation set (i.e., target set). It implies that the co-occurrence criteria of frequent shoppers should be set according to the application purpose period. For example, an analyzer should use a day as a co-occurrence criterion if he/she wants to offer a coupon valid only for a day to potential customers who will use the coupon. On the contrary, an analyzer should use a month as a co-occurrence criterion if he/she wants to publish a coupon book that can be used for a month. In the case of causal shoppers, the accuracy of association rules appears to not be affected by the period of the application purposes. The accuracy of the causal shoppers' association rules becomes higher when the longer co-occurrence criterion has been adopted. It implies that an analyzer has to set the co-occurrence criterion for as long as possible, regardless of the application purpose period.
Keywords
Data Mining; Online Market Analysis; Market Basket Analysis; Association Rule Mining;
Citations & Related Records
연도 인용수 순위
  • Reference
1 강동원, 이경미, "인터넷 쇼핑몰에서 원투원 마케팅을 위한 장바구니 분석 기법의 활용", 컴퓨터산업교육학회논문지, 2권 9호(2001), 1175-1182.
2 김남규, "장바구니크기가 연관규칙 척도의 정확성에미치는 영향", 경영정보학연구, 18권 2호(2008),95-114.
3 박 철, "인터넷 정보탐색가치가 인터넷 쇼핑 행동에미치는 영향에 관한 연구 : 쇼핑몰 방문빈도와구매의도를 중심으로", 마케팅연구, 5권 1호(2000), 143-162.
4 송만석, 박종환, 김삼원, 조윤재, "프로야구구단의효율적인 CRM을 위한 데이터마이닝 기법의적용", 한국스포츠산업경영학회지, 13권 2호(2008), 205-222.
5 안현철, 한인구, 김경재, "연관규칙기법과 분류모형을결합한 상품추천시스템 : G인터넷 쇼핑몰의사례", Information System Review, 8권 1호(2006), 181-201.
6 윤성준, "데이터마이닝 기법을 통한 백화점의 고객이탈예측 모형 연구", 한국마케팅저널, 6권 4호(2005), 45-72.
7 이종민, 정홍, 김진상, "신경망과 연관규칙을 이용한구매패턴 분류시스템의 구현", 퍼지 및 지능시스템학회, 8권 5호(2003), 530-538.
8 정영수, 강경화, "데이터마이닝 기법을 이용한인터넷 쇼핑몰 사이트의 CRM 사례분석", 경영경제연구, 27권 1호(2004), 139-156.
9 정영조, 장대철, 안병훈, "판매자간 경쟁과 구매자간 경쟁을 고려한 온라인 마켓 플레이스의 수수료구조 분석", 한국경영과학회지, 34권 1호(2009), 85-100.
10 하성호, 박상찬, "인터넷 쇼핑몰에서의 지능화된마케팅과 상품화 계획 기법", 경영정보학연구,12권 3호(2002), 71-88.
11 하성호, 이재신, "데이터마이닝을 활용한 동적인고객분석에 따른 고객관계관리 기법", 한국지능정보시스템학회논문지, 9권 3호(2003), 23-47.
12 한국인터넷진흥원, "2010년 인터넷 이용 실태 조사",한국인터넷진흥원, 2010, (available at :http://www.kisa.or.kr).
13 Agrawal, R., T. Imielinski, and A. Swami, "Mining association Rules between Sets of Items in Large Databases", in Proc. ACM SIGMOD International Conference on Management of Data, Washington D. C., (1993). 207-216.
14 Agrawal, R. and R. Srakant, "Fast Algorithms for Mining Association Rules", International Conference on Very Large Data Bases, Santiago, Chile, (1994), 487-499.
15 Burke. R, "Knowledge-based recommender systems", Encyclopedia of Library and Information Systems, Vol.69(2000).
16 Geng, L. and Hamilton, H. J., "Interestingness Measures for Data Mining : A Survey", ACM Computing Surveys, Vol.38, No.3(2006).
17 Han, J. and M. Kamber, "Data Mining : Concepts and Techiques, Morgan Kaufmann Publishers California, 2007
18 Srikka L. Jarvenpaa and Peter A. Todd, "Consumer Reaction to Electronic Shopping on the World Wide Web", International Journal of Electronic Commerce, Vol.1, No.2(Winter, 1997), 59-88.
19 Johnson, M. D. and F. Selnes, "Customer Portfolio Management : Toward a Dynamic Theory of Exchange Relationships", Journal of Marketing, Vol.68(2004), 1-17.
20 Parvatiyar, A. and J. N. Sheth, "Conceptual Framework of Customer Relationship Management", Customer Relationship Management- Emerging Concepts, Tools and Applications, New Delhi, India : Tata/Mc-Graw-Hill, (2001), 3-25.
21 Tan, P. N., V. Kumar, and J. Srivastava, "Selecting the Right Interestingness Measure for Association Patterns", 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Alberta, Canada, (2002), 32-41.
22 Wang, W. F., Y. L. Chung, M. H. Hsu, and A. C. Keh, "A Personalized Recommender System for the Cosmetic Business", Expert Systems with Applications, Vol.26, No.3(2004), 427-434.   DOI   ScienceOn
23 Ward, M. R., "Will Online Shopping Compete more with Traditional Retailing of Catalog Shopping?", Working Paper, Univ. of Illinois, Urban-Champaign, 2000.
24 Wang, W. F., Chung, Y. L., Hus, M. H. and Keh, A. C., "A Personalized Recommender System for the Cosmetic Business", Expert Systems with Applications, Vol.26, No.3(2007), 427-434.