온라인 연관관계 분석의 장바구니 기준에 대한 연구

An Investigation on Expanding Co-occurrence Criteria in Association Rule Mining

  • 김미성 (국민대학교 BIT전문대학원) ;
  • 김남규 (국민대학교 경영정보학부)
  • 투고 : 2011.08.10
  • 심사 : 2011.11.05
  • 발행 : 2011.12.31


오프라인 쇼핑몰에 비해 온라인 쇼핑몰은 빠르게 접근이 가능하기 때문에 처음 구매의사를 생성하고 실제 구매가 이루어지기까지의 기간이 오프라인 쇼핑몰에 비해 매우 짧게 나타난다. 즉 오프라인 쇼핑몰의 경우 구매 희망물건을 바로 구매하기 보다는 몇 개의 물건들을 모두 모아서 구매하는 행태가 일반적이다. 하지만, 인터넷 쇼핑몰의 경우 단 하나의 물품만을 포함하고 있는 주문이 전체 주문의 절반이상을 차지한다. 이러한 차이는 온라인 쇼핑몰 거래데이터의 분석을 위해서는 데이터 마이닝 분석에서 사용되어 온 장바구니의 정의에 대한 확장이 필요함을 의미한다. 하지만 현재까지 온라인 데이터를 대상으로 한 장바구니 분석 연구는, 장바구니의 기준 즉 동시구매의 기준에 대한 명확한 근거나 합의 없이 연구자의 선택에 따라 서로 다른 기준으로 수행되어왔다. 따라서 본 연구에서는 온라인 쇼핑몰 분석에 적용되는 동시에 구매되는 물건들에 대한 기준을 고찰해보고 연구모형을 마련하고자 한다.

There is a large difference between purchasing patterns in an online shopping mall and in an offline market. This difference may be caused mainly by the difference in accessibility of online and offline markets. It means that an interval between the initial purchasing decision and its realization appears to be relatively short in an online shopping mall, because a customer can make an order immediately. Because of the short interval between a purchasing decision and its realization, an online shopping mall transaction usually contains fewer items than that of an offline market. In an offline market, customers usually keep some items in mind and buy them all at once a few days after deciding to buy them, instead of buying each item individually and immediately. On the contrary, more than 70% of online shopping mall transactions contain only one item. This statistic implies that traditional data mining techniques cannot be directly applied to online market analysis, because hardly any association rules can survive with an acceptable level of Support because of too many Null Transactions. Most market basket analyses on online shopping mall transactions, therefore, have been performed by expanding the co-occurrence criteria of traditional association rule mining. While the traditional co-occurrence criteria defines items purchased in one transaction as concurrently purchased items, the expanded co-occurrence criteria regards items purchased by a customer during some predefined period (e.g., a day) as concurrently purchased items. In studies using expanded co-occurrence criteria, however, the criteria has been defined arbitrarily by researchers without any theoretical grounds or agreement. The lack of clear grounds of adopting a certain co-occurrence criteria degrades the reliability of the analytical results. Moreover, it is hard to derive new meaningful findings by combining the outcomes of previous individual studies. In this paper, we attempt to compare expanded co-occurrence criteria and propose a guideline for selecting an appropriate one. First of all, we compare the accuracy of association rules discovered according to various co-occurrence criteria. By doing this experiment we expect that we can provide a guideline for selecting appropriate co-occurrence criteria that corresponds to the purpose of the analysis. Additionally, we will perform similar experiments with several groups of customers that are segmented by each customer's average duration between orders. By this experiment, we attempt to discover the relationship between the optimal co-occurrence criteria and the customer's average duration between orders. Finally, by a series of experiments, we expect that we can provide basic guidelines for developing customized recommendation systems.
