High Utility Pattern Mining using a Prefix-Tree

Prefix-Tree를 이용한 높은 유틸리티 패턴 마이닝 기법

  • 정병수 (경희대학교 전자정보대학 컴퓨터공학) ;
  • 아메드 파한 (경희대학교 전자정보대학 컴퓨터공학) ;
  • 이인기 (이화대학교 컴퓨터공학과) ;
  • 용환승 (이화대학교 컴퓨터공학과)
  • Published : 2009.10.15

Abstract

Recently high utility pattern (HUP) mining is one of the most important research issuer in data mining since it can consider the different weight Haloes of items. However, existing mining algorithms suffer from the performance degradation because it cannot easily apply Apriori-principle for pattern mining. In this paper, we introduce new high utility pattern mining approach by using a prefix-tree as in FP-Growth algorithm. Our approach stores the weight value of each item into a node and utilizes them for pruning unnecessary patterns. We compare the performance characteristics of three different prefix-tree structures. By thorough experimentation, we also prove that our approach can give performance improvement to a degree.

유틸리티 패턴 마이닝은 데이터 항목에 대한 다른 가중치를 고려할 수 있는 장점으로 인하여 비즈니스 데이터를 분석하는 환경에서 효율적으로 이용되고 있다. 그러나 기존의 빈발 패턴(Frequent Pattern) 마이닝에서의 Apriori 규칙을 그대로 적용하기 어려운 문제점으로 인하여 패턴 마이닝의 성능이 현저하게 떨어지고 있다. 본 연구는 Prefix-tree를 이용하여 지속적으로 증가하는 비즈니스 트랜잭션 데이터베이스에 대한 유틸리티 패턴 마이닝을 효과적으로 수행하기 위한 기법을 제안한다. 제안하는 기법은 Prefix-tree의 각 항목 노드에 유틸리티 값을 저장하여 FP-Growth 알고리즘에서와 같이 트리의 상향 탐색을 통하여 높은 유틸리티 패턴을 빠르게 찾아낸다. 여러 형태의 실험을 통하여 이용할 수 있는 세가지 다른 Prefix-tree 구조들 간의 성능적 특징과 패턴 탐색의 방법들을 비교하였으며 실험 결과에 따라 제안하는 기법이 기존의 기법들에 비해 많은 성능 향상을 가져올 수 있는 것을 입증하였다.

Keywords

References

  1. R. Agrawal, T. Imielinski and A. Swami, 'Mining association rules between sets of items in large databases,' Proc. of the 12th ACM SIGMOD Int'l Conf. on Management of Data, pp. 207-216, May 1993 https://doi.org/10.1145/170036.170072
  2. R. Agrawal and R. Srikant, 'Fast algorithms for mining association rules in large databases,' Proc. of the 20th Int'l Conf on Very Large Data Bases, Sep. pp.487-499, 1994
  3. J. Han, J. Pei, Y. Yin and R. Mao, 'Mining frequent patterns without candidate generation: a frequent-pattern tree approach,' Data Mining and Knowledge Discovery, vol.8, pp.53-87, 2004 https://doi.org/10.1023/B:DAMI.0000005258.31418.83
  4. H. Yao and H. J. Hamilton, 'Mining itemset utilities from transaction databases,' Data & Knowledge Engineering, vol. 59, pp.603-626, 2006 https://doi.org/10.1016/j.datak.2005.10.004
  5. C.F. Ahmed, S.K Tanbeer, B.-S. Jeong and Y.-K Lee, 'Mining high utility patterns in incremental databases,' Proc of ICUIMC, pp.653-663, Feb. 2009 https://doi.org/10.1145/1516241.1516357
  6. H. Yao and H. J. Hamilton, 'Mining itemset utilities from transaction databases,' Data & Knowledge Engineering, vol.59, pp.603-626, 2006 https://doi.org/10.1016/j.datak.2005.10.004
  7. U. Yun, 'WIS: Weighted interesting sequential pattern mining with a similar level of support and/or weight,' ETRI Journal, vol.29, no.3, pp.336-352, Jun. 2007 https://doi.org/10.4218/etrij.07.0106.0067
  8. XLi, Z.-H. Deng and S. Tang, 'A fast algorithm for maintenance of association rules in incremental databases,' Advanced Data Mining and Application (ADMA 06), vol.4093, pp.56-63, Jul. 2006 https://doi.org/10.1007/11811305_5
  9. S. Zhang, J. Zhang and C. Zhang, 'EDUA: An efficient algorithm for dynamic database mining,' Information Science, vol.177, pp.2756-2767, 2007 https://doi.org/10.1016/j.ins.2007.01.034
  10. J. Hu and A. Mojsilovic, 'High utility pattern mining: A method for discovery of high utility item sets,' Pattern Recognition, vol.40, pp. 3317- 3324, 2007 https://doi.org/10.1016/j.patcog.2007.02.003
  11. Y. Liu, W.-K Liao, A. Choudhary, 'A fast high utility itemsets mining algorithm,' Proc. 1st IntI. Canf. on Utility-Based Data Mining, pp.90-99, Aug. 2005. https://doi.org/10.1145/1089827.1089839
  12. F. Tao, 'Weighted association rule mining using weighted support and significant framework,' Proc. of the 9th ACM SIGKDD Int'l Conf. on Knowledge Discovery and Data Mining, pp.661-666, 2003 https://doi.org/10.1145/956750.956836
  13. B. Barber and H.J. Hamilton, 'Extracting share frequent itemsets with infrequent subsets,' Data Mining and Knowledge Discovery, vol.7, pp.153-185, 2003 https://doi.org/10.1023/A:1022419032620
  14. Y. Liu, W.-K Liao and A. Choudhary, 'A Two phase algorithm for fast discovery of high utility of itermsets,' Proc. of the 9th Pacific-Asia Conf. on Knowledge Discovery and Data Mining(PAKDD'05), pp.689-695, May 2005
  15. Y. Liu, W.-K. Liao, A. Choudhary, 'A fast high utility itemsets mining algorithm,' Proc. 1st IntI. Conf. on Utility-Based Data Mining, pp.90-99, Aug. 2005 https://doi.org/10.1145/1089827.1089839
  16. J.-L. Koh, S.-F. Shieh, 'An efficient approach for maintaining association rules based on adjusting FP-tree structures,' Proceedings of the DASFAA' 04, pp.417-424, 2004
  17. XLi, Z.-H. Deng and S. Tang, 'A fast algorithm for maintenance of association rules in incremental databases,' Advanced Data Mining and Application (ADMA 06), vol.4093, pp.56-63, Jul 2006 https://doi.org/10.1007/11811305_5
  18. C. K-S. Leung Q.I. Khan, Z. Li and T. Hoque 'Can'Tree: a canonical-order tree for incremental frequent-pattern mining,' Knowledge and Information Systems, vol.11, no.3, pp.287-311, 2007 https://doi.org/10.1007/s10115-006-0032-8
  19. A. Erwin, RP. Gopalan, N.R. Achuthan, 'CTUMine: an efficient high utility itemset mining algorithm using the pattern growth approach,' Proc. of the Seventh IEEE Int. Conf. on Computer and Information Technology (CIT'07), pp.71-76, Oct. 2007 https://doi.org/10.1109/CIT.2007.120
  20. S.K. Tanbeer, C.F. Ahmed, B.-S. Jeong and Y.-K. Lee, 'CP-tree: A tree structure for single pass frequent pattern mining,' Proc. of the 12th Pacific Asia Conf. on Knowledge Discovery and Data Mining (PAKDD'08), May 2008 https://doi.org/10.1007/978-3-540-68125-0_108