IRFP-tree: Intersection Rule Based FP-tree

Lee, Jung-Hun;

doi:10.3745/KTSDE.2016.5.3.155

KIPS Transactions on Software and Data Engineering (정보처리학회논문지:소프트웨어 및 데이터공학)

Volume 5 Issue 3
/
Pages.155-164
/
2016
/
2287-5905(pISSN)
/
2734-0503(eISSN)

Korea Information Processing Society (한국정보처리학회)

DOI QR Code

IRFP-tree: Intersection Rule Based FP-tree

IRFP-tree(Intersection Rule Based FP-tree): 메모리 효율성을 향상시키기 위해 교집합 규칙 기반의 패러다임을 적용한 FP-tree

Lee, Jung-Hun

이정훈 (동국대학교 전산원 컴퓨터해킹보안전공)

Received : 2015.11.19
Accepted : 2016.02.29
Published : 2016.03.31

https://doi.org/10.3745/KTSDE.2016.5.3.155 Citation PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

For frequency pattern analysis of large databases, the new tree-based frequency pattern analysis algorithm which can compensate for the disadvantages of the Apriori method has been variously studied. In frequency pattern tree, the number of nodes is associated with memory allocation, but also affects memory resource consumption and processing speed of the growth. Therefore, reducing the number of nodes in the tree is very important in the frequency pattern mining. However, the absolute criteria which need to order the transaction items for construction frequency pattern tree has lowered the compression ratio of the tree nodes. But most of the frequency based tree construction methods adapted the absolute criteria. FP-tree is typically frequency pattern tree structure which is an extended prefix-tree structure for storing compressed frequent crucial information about frequent patterns. For construction the tree, all the frequent items in different transactions are sorted according to the absolute criteria, frequency descending order. CanTree also need to absolute criteria, canonical order, to construct the tree. In this paper, we proposed a novel frequency pattern tree construction method that does not use the absolute criteria, IRFP-tree algorithm. IRFP-tree(Intersection Rule based FP-tree). IRFP-tree is constituted with the new paradigm of the intersection rule without the use of the absolute criteria. It increased the compression ratio of the tree nodes, and reduced the tree construction time. Our method has the additional advantage that it provides incremental mining. The reported test result demonstrate the applicability and effectiveness of the proposed approach.

대용량 데이터베이스의 빈도패턴 분석을 위해 기존의 Apriori 방식의 단점을 보완할 수 있는 새로운 트리 기반의 빈도 패턴 분석 알고리즘이 최근 다양하게 연구되고 있다. 그 중 FP-tree는 이러한 빈도 패턴을 분석하기 위해 빈도 패턴을 표현하는 트리 구조로 단 두 번의 전체 데이터베이스 스캔을 통해 빠르게 트리를 구성할 수 있으며 FP-grwoth를 통해 빈도 패턴을 분석할 수 있다. 이처럼 빈도 패턴 트리의 노드 수는 트리 자체의 메모리 할당량과도 연관이 있지만 그 후 growth의 메모리 자원 소비 및 처리 속도에도 영향을 미치게 된다. 따라서 빈도 패턴 트리의 노드 수의 감소는 트리 자체뿐만 아니라 빈도 패턴 분석에 있어서도 매우 중요하다. 하지만 FP-tree는 전체 아이템 수 라는 고정된 기준 문제로 인해 충분한 노드 수의 압축률을 갖지 못하고 있다. 본 논문에서는 이러한 FP-tree의 문제를 보완하여 좀 더 노드 수를 감소시킬 수 있도록 교집합 규칙이라는 새로운 패러다임을 적용한 빈도 패턴 트리인 IRFP-tree를 제시하고 실험을 통해 그 성능에 대해 증명하였다.

Keywords

References

R. Agrawal, T. Imieliski, and A. Swami, "Mining association rules between sets of items in large databases," in Proc. ACM SIGMOD Int. Cont. Manage. Data, pp.207-216, 1993.
R. Agrawal and R. Srikant, "Fast algorithms for mining association rules in large databases," in Proc. Int. Conf. Very Large Data Bases, pp.487-499, 1994.
Jiawei Han, Jian Pei, and Yiwen Yin, "Mining Frequent Patterns without Candidate Generation," in ACM-SIGMOD, Dallas, 2000.
Chris Giannella, Jiawei Han, Jian Pei, Xifeng Yan, and Philip S. Yu, "Mining Frequent Patterns in Data Streams at Multiple Time Granularities; Data Mining: Next Generation Challenges and Future Directions," AAAI/MIT, 2003.
O. R. Zaiane and M. El-Hajj, "COFI Approach for Mining Frequent Itemsets Revisited," Proc. ACM SIGMOD Workshop Res. Issues Data Mining Knowl. Discovery, pp.70-75, 2004.
O. R. Zaiane and M. El-Hajj, "Cofi-tree mining: A new approach to pattern growth with reduced candidacy," in Proceedings of the IEEE ICDM Workshop on Frequent Itemset Mining Implementation(FIMI'03), 2003.
M. Adnan and R. Alhajj, "DRFP-tree: Disc-resident frequent pattern tree," Appl. Intell., Vol.30, No.2, pp.84-97, 2009 https://doi.org/10.1007/s10489-007-0099-2
M. Adnan and R. Alhajj, "A bounded and Adaptive Memorybased Approach to Mine Frequent Patterns From Very Large Databases," IEEE Transactions on Systems, Man, and Cybernetics, Part B(2011), pp.154-172. https://doi.org/10.1109/TSMCB.2010.2048900
C. K. -S. Leung, Q. I. Khan, and T. Hoque, "CanTree:A Tree Structure for Efficient Incremental Mining of Frequent Patterns," Proc. IEEE Int. Conf. Data Mining, pp.274-281, 2005.
C. K. -S. Leung, and Q. I. Khan, "DSTree:a tree structure for the mining of frequent sets from data streams," in Proc. IEEE ICDM, pp.928-932, 2006.
G. Liu, H. Lu, J. X. Yu, W. Wang, and X. Xiao, "AFOPT: An efficient implementation of pattern growth approach," in Proc. FIMI, 2003.
Cheung, William and Osmar R. Zaiane, "Incremental mining of frequent patterns without candidate generation or support constraint," DatabaseEngineering and Applications Symposium, Proceedings. Seventh International. IEEE, pp.111-116, 2003.
B. Goethals, "Memory issues in frequent itemset mining," in proc. ACM SAC, pp.530-534, 2004.
R. Vaarandi, "A breadth-first algorithm for mining frequent patterns from event logs," in Proc. IEEE INTELLCOMM, pp.274-281, 2004.
G. Buehrer, S. Parthasarathy, and A. Ghoting, "Out-of-core frequent pattern mining on a commodity PC," in Proc. 12th ACM SIGKDD Int. Conf. KDD, pp.86-95, 2006.

Cited by

PPFP(Push and Pop Frequent Pattern Mining): A Novel Frequent Pattern Mining Method for Bigdata Frequent Pattern Mining vol.5, pp.12, 2016, https://doi.org/10.3745/KTSDE.2016.5.12.623