DOI QR코드

DOI QR Code

High Utility Itemset Mining Using Transaction Utility of Itemsets

항목집합의 트랜잭션 유틸리티를 이용한 높은 유틸리티 항목집합 마이닝

  • 이세린 (성신여자대학교 컴퓨터학과) ;
  • 박종수 (성신여자대학교 IT학부)
  • Received : 2015.06.16
  • Accepted : 2015.08.04
  • Published : 2015.11.30

Abstract

High utility itemset(HUI) mining refers to the discovery of itemsets with high utilities which are not less than a user-specified minimum utility threshold, by considering both the quantities and weight factors of items in a transaction database. Recently the utility-list based HUI mining algorithms have been proposed to avoid numerous candidate itemsets and the algorithms need the costly join operations. In this paper, we propose a new HUI mining algorithm, using the utility-list with additional attributes of transaction utility and common utility of itemsets. The new algorithm decreases the number of join operations and efficiently prunes the search space. Experimental results on both synthetic and real datasets show that the proposed algorithm outperforms other recent algorithms in runtime, especially when datasets are dense or contain many long transactions.

높은 유틸리티 항목집합 마이닝은 트랜잭션 데이터베이스에서 사용자가 지정한 최솟값 이상의 유틸리티를 갖는 항목집합들을 항목의 수량과 가중치값을 동시에 고려하여 찾아내는 것이다. 최근에 연구된 유틸리티-리스트 기반의 높은 유틸리티 항목집합 마이닝 알고리즘은 많은 후보 항목집합들을 피하기 위해 제안되었으며 비용이 높은 조인 연산을 수행한다. 본 논문은 유틸리티-리스트 구조에 항목집합의 트랜잭션 유틸리티와 공통 유틸리티 속성을 추가한 새로운 알고리즘을 제안한다. 이 새로운 알고리즘은 조인 연산의 수를 줄이고 탐색 공간을 효과적으로 가지치기한다. 생성 데이터와 실 환경 데이터상의 실험 결과를 통해 제안된 알고리즘이 다른 최근 알고리즘들에 비해 실행 시간 면에서 아주 우수하고, 특히 데이터가 조밀하거나 항목집합의 길이가 긴 경우에 더 효율적이라는 것을 보여준다.

Keywords

References

  1. R. Agrawal and R. Srikant, "Fast algorithms for mining association rules," in Proceedings of the 20th International Conference on Very Large Data Bases, Santiago, Vol.1215, pp.487-499, 1994.
  2. J. Han, J. Pei, and Y. Yin, "Mining frequent patterns without candidate generation," in Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, Dallas, pp.1-12, 2000.
  3. W. Wang, J. Yang, and P. Yu, "Efficient mining of weighted association rules (WAR)," in Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Boston, pp.270-274, 2000.
  4. F. Tao, F. Murtagh, and M. Farid, "Weighted association rule mining using weighted support and significance framework," in Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, pp.661-666, 2003.
  5. H. Yao, H. J. Hamilton, and C. J. Butz, "A foundational approach to mining itemset utilities from databases," in Proceedings of the Fourth SIAM International Conference on Data Mining. SIAM, pp.482-486, 2004.
  6. Y. Liu, W. Liao, and A. Choudhary, "A fast high utility itemset mining algorithm," in Proceedings of the 1st International Workshop on Utility-Based Data Mining, ACM, Chicago, pp.90-99, 2005.
  7. Y. Liu, W. Liao, and A. Choudhary, "A two-phase algorithm for fast discovery of high utility itemsets," in Proceedings of the 9th Pacific-Asia Conference on Knowledge Discovery and Data Mining, Springer, Vol.3518, pp.689-695, 2005.
  8. H. Yao and H. J. Hamilton, "Mining itemset utilities from transaction databases," Data & Knowledge Engineering, Elsevier, Vol.59, No.3, pp.603-626, 2006. https://doi.org/10.1016/j.datak.2005.10.004
  9. C. F. Ahmed, S. K. Tanbeer, B.-S. Jeong, and Y.-K. Lee, "Efficient tree structures for high utility pattern mining in incremental databases," IEEE Transactions on Knowledge and Data Engineering, Vol.21, No.12, pp.1708-1721, 2009. https://doi.org/10.1109/TKDE.2009.46
  10. B.-S. Jeong, C. F. Ahmed, I. Lee, and H. Yong, "High utility pattern mining using a prefix-tree," Journal of KIISE: Database, Vol.36, No.5, pp.341-351, 2009. (in Korean)
  11. V. S. Tseng, C.-W. Wu, B.-E. Shie, and P. S. Yu, "UP-Growth: an efficient algorithm for high utility itemset mining," in Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, pp.253-262, 2010.
  12. V. S. Tseng, B.-E. Shie, C.-W. Wu, and P. S. Yu, "Efficient algorithms for mining high utility itemsets from transactional databases," IEEE Transactions on Knowledge and Data Engineering, Vol.25, No.8, pp.1772-1786, 2013. https://doi.org/10.1109/TKDE.2012.59
  13. M. Liu and J. Qu, "Mining high utility itemsets without candidate generation," in Proceedings of the 21st ACM International Conference on Information and Knowledge Management, Maui, pp.55-64, 2012.
  14. P. Fournier-Viger, C.-W. Wu, S. Zida, and V. S. Tseng, "FHM: Faster high-utility itemset mining using estimated utility co-occurrence pruning," Foundations of Intelligent Systems, Springer, pp.83-92, 2014.
  15. Frequent Itemset Mining Dataset Repository. Available at [Internet] http://fimi.cs.helsinki.fi/data/.