Performance Analysis of Top-K High Utility Pattern Mining Methods

Ryang, Heungmo;Yun, Unil;Kim, Chulhong;

doi:10.7472/jksii.2015.16.6.89

Journal of Internet Computing and Services (인터넷정보학회논문지)

Volume 16 Issue 6
/
Pages.89-95
/
2015
/
1598-0170(pISSN)
/
2287-1136(eISSN)

Korean Society for Internet Information (한국인터넷정보학회)

DOI QR Code

Performance Analysis of Top-K High Utility Pattern Mining Methods

상위 K 하이 유틸리티 패턴 마이닝 기법 성능분석

Ryang, Heungmo (Dept. of Computer Engineering, Sejong University) ;
Yun, Unil (Dept. of Computer Engineering, Sejong University) ;
Kim, Chulhong (Electronics and Telecommunication Research Institute)

Received : 2015.08.25
Accepted : 2015.11.07
Published : 2015.12.31

https://doi.org/10.7472/jksii.2015.16.6.89 Citation PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

Traditional frequent pattern mining discovers valid patterns with no smaller frequency than a user-defined minimum threshold from databases. In this framework, an enormous number of patterns may be extracted by a too low threshold, which makes result analysis difficult, and a too high one may generate no valid pattern. Setting an appropriate threshold is not an easy task since it requires the prior knowledge for its domain. Therefore, a pattern mining approach that is not based on the domain knowledge became needed due to inability of the framework to predict and control mining results precisely according to the given threshold. Top-k frequent pattern mining was proposed to solve the problem, and it mines top-k important patterns without any threshold setting. Through this method, users can find patterns from ones with the highest frequency to ones with the k-th highest frequency regardless of databases. In this paper, we provide knowledge both on frequent and top-k pattern mining. Although top-k frequent pattern mining extracts top-k significant patterns without the setting, it cannot consider both item quantities in transactions and relative importance of items in databases, and this is why the method cannot meet requirements of many real-world applications. That is, patterns with low frequency can be meaningful, and vice versa, in the applications. High utility pattern mining was proposed to reflect the characteristics of non-binary databases and requires a minimum threshold. Recently, top-k high utility pattern mining has been developed, through which users can mine the desired number of high utility patterns without the prior knowledge. In this paper, we analyze two algorithms related to top-k high utility pattern mining in detail. We also conduct various experiments for the algorithms on real datasets and study improvement point and development direction of top-k high utility pattern mining through performance analysis with respect to the experimental results.

전통적인 빈발 패턴 마이닝은 데이터베이스로부터 사용자 정의 최소 임계치 이상의 빈도수를 가지는 유효 패턴들을 식별한다. 적절한 임계치 설정은 해당 도메인에 대한 사전 지식을 요구하므로 쉬운 작업이 아니다. 따라서 임계치 설정을 통한 마이닝 결과의 정밀한 제어 불가능으로 인해 도메인 지식을 기반으로 하지 않는 패턴 마이닝 방법이 필요하게 되었다. 상위 K 빈발 패턴 마이닝은 이러한 문제를 해결하기 위해 제안되었으며, 임계치 설정 없이 상위 K개의 중요 패턴들을 마이닝 한다. 사용자는 이를 적용함으로써 데이터베이스에 상관없이 가장 높은 빈도수의 패턴부터 K번째로 높은 빈도수의 패턴까지 찾아낼 수 있다. 비록 상위 K 빈발 패턴 마이닝이 임계치 설정 없이 상위 K개의 중요 패턴들을 마이닝 하지만, 트랜잭션 내 아이템 수량과 데이터베이스 내 서로 다른 아이템 중요도를 고려하지 못하여 많은 실세계 응용의 요구에 부합하지 못한다. 하이 유틸리티 패턴 마이닝은 아이템 중요도가 포함된 비 바이너리 데이터베이스의 특성을 고려하기 위해 제안되었으나 최소 임계치를 필요로 한다. 최근 임계치 설정 없는 하이 유틸리티 패턴 마이닝을 위한 상위 K 하이 유틸리티 패턴 마이닝이 개발되었으며, 이를 통해 사용자는 사전 지식 없이 원하는 수의 패턴을 마이닝 할 수 있다. 본 논문은 상위 K 하이 유틸리티 패턴 마이닝을 위한 알고리즘을 분석한다. 최신 알고리즘에 대한 성능분석을 통해 개선사항 및 발전 방향에 대해 고찰한다.

Keywords

References

G. Lee and U. Yun, "Analysis and Performance Evaluation of Pattern Condensing Techniques used in Representative Pattern Mining", Journal of Internet Computing and Services, Vol. 16, No. 2, pp. 77-83, 2015. http://dx.doi.org/10.7472/jksii.2015.16.2.77
G. Pyun and U. Yun, "Performance evaluation of approximate pattern mining based on probabilistic technique", Journal of Internet Computing and Services, Vol. 14, No. 1, pp. 63-69, 2013. http://dx.doi.org/10.7472/jksii.2013.14.63
J. Han, J. Pei, Y. Yin, and R. Mao, "Mining frequent patterns without Candidate Generation: A frequent-Pattern Tree Approach", Data Mining and Knowledge Discovery, Vol.8, No.1, pp.53-87, 2004. http://dx.doi.org/10.1023/B:DAMI.0000005258.31418.83
U. Yun and G. Lee, "A Weighted Frequent Graph Pattern Mining Approach considering Length-Decreasing Support Constraints", Journal of Internet Computing and Services, Vol. 15, No. 6, pp. 125-132, 2014. http://dx.doi.org/10.7472/jksii.2014.15.6.125
V.S. Tseng, B.-E. Shie, C.-W. Wu, and P.S. Yu, "Efficient Algorithms for Mining High Utility Itemsets from Transactional Databases", IEEE Transactions on Knowledge and Data Engineering, vol. 25, no. 8, 2013, pp. 1772-1786. http://dx.doi.org/10.1109/TKDE.2012.59
U. Yun, H. Ryang, and K. Ryu, "High utility itemset mining with techniques for reducing overestimated utilities and pruning candidates", Expert Systems with Applications, Vol. 41, No. 8, pp. 3861-3878, 2014. http://dx.doi.org/10.1016/j.eswa.2013.11.038
Q. Huynh-Thi-Le, T. Le, B. Vo, and H.B. Le, "An efficient and effective algorithm for mining top-rank-k frequent patterns", Expert Systems with Applications, Vol. 42, No. 1, pp. 156-164, 2015. http://dx.doi.org/10.1016/j.eswa.2014.07.045
G. Pyun and U. Yun, "Mining top-k frequent patterns with combination reducing techniques", Applied Intelligence, Vol. 41, No. 1, pp. 76-98, 2014. http://dx.doi.org/10.1007/s10489-013-0506-9
H. Ryang and U. Yun, "Performance Analysis of Frequent Pattern Mining with Multiple Minimum Supports", Journal of Internet Computing and Services, Vol. 14, No. 6, pp. 1-8, 2013. http://dx.doi.org/10.7472/jksii.2013.14.6.01
R. Agrawal, T. Imilienski, and A, Swami, "Mining association rules between set of items in large databases", ACM SIGMOD, Vol.40, No.2, pp.207-216, 1993. http://dx.doi.org/10.1145/170036.170072
C.-W. Wu, B.-E. Shie, V.S. Tseng, and P.S. Yu, "Mining top-K high utility itemsets", in Proc. of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2012, pp. 78-86. http://dx.doi.org/10.1145/2339530.2339546
J. Pisharath, Y. Liu, B. Ozisikyilmaz, R. Narayanan, W.K. Liao, A. Choudhary, and Memik G, NU-MineBench version 2.0 dataset and technical report, http://cucis.ece.northwestern.edu/projects/DMS/Mineyunei@sejong.ac.kr

Cited by

A Hybrid K-anonymity Data Relocation Technique for Privacy Preserved Data Mining in Cloud Computing vol.17, pp.5, 2015, https://doi.org/10.7472/jksii.2016.17.5.51