Browse > Article

Performance Evaluation of the FP-tree and the DHP Algorithms for Association Rule Mining  

Lee, Hyung-Bong (강릉대학교 컴퓨터공학과)
Kim, Jin-Ho (강원대학교 컴퓨터공학과)
Abstract
The FP-tree(Frequency Pattern Tree) mining association rules algorithm was proposed to improve mining performance by reducing DB scan overhead dramatically, and it is recognized that the performance of it is better than that of any other algorithms based on different approaches. But the FP-tree algorithm needs a few more memory because it has to store all transactions including frequent itemsets of the DB. This paper implements a FP-tree algorithm on a general purpose UNK system and compares it with the DHP(Direct Hashing and Pruning) algorithm which uses hash tree and direct hash table from the point of memory usage and execution time. The results show surprisingly that the FP-tree algorithm is poor than the DHP algorithm in some cases even if the system memory is sufficient for the FP-tree. The characteristics of the test data are as follows. The site of DB is look, the number of total items is $1K{\sim}7K$, avenrage length of transactions is $5{\sim}10$, avergage size of maximal frequent itemsets is $2{\sim}12$(these are typical attributes of data for large-scale convenience stores).
Keywords
FP-tree; DHP; hash tree; direct hash table;
Citations & Related Records
Times Cited By KSCI : 1  (Citation Analysis)
연도 인용수 순위
1 이형봉, 김진호, 'DHP 연관 규칙 탐사 알고리즘에서 직접 해시 테이블과 해시 트리를 위한 효율적인 해싱 방안', 정보과학회 제출논문(DB06-01-19-01-2), 2006년 1월
2 G. Grahne, J. Zhu, 'High Performance Mining Frequent of Maximal Frequent Itemsets,' Proceedings of SIAM Workshop High Performance Data Mining: Pervasive and Data Stream Mining, May 2003
3 D.L. Yang, C. T. Pan, Y. C. Chung, 'An Efficient Hash-Based Method for Discovering the Maximal Frequent Set,' Proceedings of 25th Annual Int'l Computer Software and Applications Conference (COMPSAC'0), pp. 511-516, October 2001
4 G. Grahne, J. Zhu, 'Fast Algorithms for Frequent Itemset Mining Using FP-trees,' IEEE Transactions on Knowledge and Data Engineering, Vol.7, No.10, pp. 1347-1362, October 2005
5 J. Han, J. Pei, and Y. Yin, 'Mining frequent patterns without candidate generation,' Proceedings ACM SIGMOD Int'l Conf. Management of Data (SIGMOD'00), pp. 1-12, May 2000
6 J. S. Park, M.-S. Chen and P. S. Yu, 'An Effective Hash-Based Algorithm for Mining Association Rules,' Proceedings of ACM SIGMOD, pp. 175-186, 1995   DOI   ScienceOn
7 이재문, '대용량 주기억장치 시스템에서 효율적인 연관 규칙 탐사 알고리즘', 정보처리학회논문지D 제9-D권, 제4호, pp. 579-586, 2002년 8월
8 이형봉, '완전 해싱을 위한 DHP 연관 규칙 탐사 알고리즘의 개선 방안', 정보과학회논문지:데이타베이스, 제31권, 제2호, pp. 91-98, 2004년 4월
9 R. Agrawal and et al, 'Synthetic Data Generation Code for Associations and Sequential Patterns,' http://www.almaden.ibm.com/cs/quest
10 이형봉, 'FP-tree 연관 규칙 탐사 알고리즘의 구현 및 성능 특성', 정보처리학회 추계학술발표대회논문집, 제13권, 제2호, pp. 337-340, 2006년 11월
11 A. Pietracaprina, D. Zandolin, 'Mining Frequent Itemsets Using Patrica Tries,' Proceedings of IEEE ICDM Workshop Frequent Itemset Mining Implementations, CEUR Workshop Proceedings, Vol.80, November 2003
12 R. Agrawal, T. Imielinski and A. Swami, 'Mining Association Rules between Sets of Items in Large Databases,' Proceedings of ACM SIGMOD on Management of Data, pp. 207-216, May 1993   DOI   ScienceOn
13 이재문, 박종수, '복합 해쉬 트리를 이용한 효율적인 연관 규칙 탐사 알고리즘', 정보과학회논문지(B) 제 26권, 제 3호, pp. 343-352, 1999년 3월
14 R. Agrawal and R. Srikant, 'Fast Algorithms for Mining Association Rules,' Proceedings of the 20th International Conference on Very Large Databases, pp. 487-499, September 1994
15 M. Adnan, R. Alhajj, K. Barker, 'Alternative Method for Incrementally Constructing th FP-Tree,' Proceedings of 3rd Int'l IEEE Conference Intelligent Systems, pp. 494-499, September 2006