Browse > Article
http://dx.doi.org/10.3745/KTSDE.2019.8.9.363

Purchase Transaction Similarity Measure Considering Product Taxonomy  

Yang, Yu-Jeong (숙명여자대학교 컴퓨터과학과)
Lee, Ki Yong (숙명여자대학교 소프트웨어학부)
Publication Information
KIPS Transactions on Software and Data Engineering / v.8, no.9, 2019 , pp. 363-372 More about this Journal
Abstract
A sequence refers to data in which the order exists on the two items, and purchase transaction data in which the products purchased by one customer are listed is one of the representative sequence data. In general, all goods have a product taxonomy, such as category/ sub-category/ sub-sub category, and if they are similar to each other, they are classified into the same category according to their characteristics. Therefore, in this paper, we not only consider the purchase order of products to compare two purchase transaction sequences, but also calculate their similarity by giving a higher score if they are in the same category in spite of their difference. Especially, in order to choose the best similarity measure that directly affects the calculation performance of the purchase transaction sequences, we have compared the performance of three representative similarity measures, the Levenshtein distance, dynamic time warping distance, and the Needleman-Wunsch similarity. We have extended the existing methods to take into account the product taxonomy. For conventional similarity measures, the comparison of goods in two sequences is calculated by simply assigning a value of 0 or 1 according to whether or not the product is matched. However, the proposed method is subdivided to have a value between 0 and 1 using the product taxonomy tree to give a different degree of relevance between the two products, even if they are different products. Through experiments, we have confirmed that the proposed method was measured the similarity more accurately than the previous method. Furthermore, we have confirmed that dynamic time warping distance was the most suitable measure because it considered the degree of association of the product in the sequence and showed good performance for two sequences with different lengths.
Keywords
Sequence Similarity Measure; Transaction Data Analysis; Product Taxonomy; Levenshtein Distance; Dynamic Time Warping;
Citations & Related Records
연도 인용수 순위
  • Reference
1 M. Sforna, "Data mining in a power company customer database," Electric Power Systems Research, 2000.
2 C. Rygielski, J. Wang, and D. C. Yen, "Data mining techniques for customerrelationship management," Technology in Society, Vol.24, No.4, pp.483-502, 2002.   DOI
3 M. Kaur and S. Kang, "Market Basket Analysis: Identify the Changing Trends of Market Data Using Association Rule Mining," Procedia Computer Science, Vol.85, pp.78-85, 2016.   DOI
4 E.W.T Ngai, L. Xiu, and D.C.K Chau, "Application of data mining techniques in customer relationship management: A literature review and classification," Expert Systems with Applications, Vol.36, No.2, pp.2592-2602, 2009.   DOI
5 T. Brijs, G. Swinnen, K. Vanhoof, and G. Wets, "Using association rules for product assortment decisions: a casestudy," in Proc. of the Fifth International Conference on Knowledge Discovery and Data Mining, pp.254-260, 1999.
6 S. Park, N. C. Suresh, and B. K. Jeong, "Sequence-based clustering for Web usage mining: A new experimental framework and ANN-enhanced K-means algorithm," Data & Knowledge Engineering, Vol.65, No.3, pp.512-543, 2008.   DOI
7 E. Zorita, P. Cusco, and G. J. Filion, "Starcode: sequence clustering based on all-pairs search," Bioinformatics, Vol.31, No.12, pp.1913-1919, 2015.   DOI
8 T. F. Smith and M. S. Waterman, "Identification of Common Molecular Subsequences," Journal of Molecular Biology, Vol.147, pp.195-197, 1981.   DOI
9 P. Jaccard, "Etude comparative de la distribution florale dans une portion des Alpes et des Jura," Bulletin de la Societe Vaudoise des Sciences Naturelles, Vol.37, pp.547-579, 1901.
10 Levenshtein, "Binary codes capable of correcting deletions, insertions, and reversals," Soviet Physics Doklady, Vol.10, pp.707-710, 1966.
11 D. Berndt and J. Clifford, "Using Dynamic Time Warping to Find Patterns in Time Series," In Proc. of KDD Workshop, 1994.
12 S. B. Needleman and C. D. Wunsch, "A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins," Journal of Molecular Biology, Vol.48, pp.443-453, 1970.   DOI