DOI QR코드

DOI QR Code

Frequent Pattern Mining By using a Completeness for BigData

빅데이터에 대한 Completeness를 이용한 빈발 패턴 마이닝

  • Park, In-Kyu (Dept. of Game Software, College of Engineering Joongbu University)
  • 박인규 (중부대학교 게임소프트웨어학과)
  • Received : 2018.12.18
  • Accepted : 2018.02.27
  • Published : 2018.04.20

Abstract

Most of those studies use frequency, the number of times a pattern appears in a transaction database, as the key measure for pattern interestingness. It prerequisites that any interesting pattern should occupy a maximum portion of the transactions it appears. But in our real world scenarios the completeness of any pattern is more likely to become various in transactions. Hence, we should also consider the problem of finding the qualified patterns with the significant values of the weighted support by completeness in order to reduce the loss of information within any pattern in transaction. In these pattern recommendation applications, patterns with higher completeness may lead to higher recall while patterns with higher completeness may lead to higher recall while patterns with higher frequency lead to higher precision. In this paper, we propose a measure of weighted support and completeness and an algorithm WSCFPM(weigted support and completeness frequent pattern mining). Our algorithm handles the invalidation of the monotone or anti-monotone property which does not hold on completeness. Extensive performance analysis show that our algorithm is very efficient and scalable for word pattern mining.

대부분의 빈발 패턴은 패턴이 트랜잭션 데이터베이스에 나타나는 support를 패턴 interestingness의 핵심 척도로 다루어 왔으나 패턴의 횟수는 패턴의 completeness가 가지는 정보를 최대치로 가정하고 있다. 그러나 실제적으로는 임의의 패턴 X의 completeness는 트랜잭션에서 서로 다르게 나타나기 마련이다. 따라서 패턴이 가지는 정보의 손실을 줄이기 위해서는 가중치에 의한 support와 completeness에 의한 유용한 패턴 마이닝을 고려하여야 한다. 즉, 높은 completeness율을 갖는 패턴은 더 높은 recall로 이어질 수 있고 높은 빈도수를 갖는 패턴은 보다 높은 정밀도로 이어진다. 본 논문에서는 동적인 항목들의 가중치에 따른 적응된 support와 completeness를 고려하는 WSCFPM 패턴 마이닝 알고리즘을 제안한다. 제안한 방법은 모노톤 또는 반 모노톤 속성이 가중치에 의한 support와 completeness에 영향을 미치지 않기 때문에 탐색과정을 줄일 수 있다. 실험결과를 통하여 제안된 알고리즘이 효과적이며 확장성이 좋은 것임을 보인다.

Keywords

References

  1. S. Y. Hong, "New Authentication Methods based on User's Behavior Big Data Analysis on Cloud", Convergence Society for SMB, Vol. 6, No. 4, pp.31-36, 2016.
  2. J. E. Shin, B. H. Jeong, D. H. Lim, "BigData Distribution System using RHadoop", Society of Data Information Science, Vol. 36, No. 5, pp. 1155-1166, 2015.
  3. R. Agrawal, R. Srikant, "Fast Algorithm for Mining Association Rules", In: 20 th Int. Conf. on Very Large Data Bases, pp. 487-499, 1994.
  4. C. H. Cai, A. W. C. Fu, C. H. Cheng, W. W. Kwong, "Mining Association rules with weighted items", In Proceedings of Intl. Database Engineering and Applications Symposium (IDEAS 1988) , Cardiff, Wales, UK, July pp. 68-77, 1998.
  5. F. Tao, "Weighted association rule Mining using Weighted Support and Significant Framework", In:9th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining", pp. 661-666, 2003.
  6. W. Wang, J. Yang, P. S. Yu, "WAR: Weighted Association Rules Item Intensities", Knowledge Information and Systems, No. 6, pp. 203-229, 2003.
  7. U. Yun, J. J. Leggett, "WFIM: Weighted Frequent Itemset Mining with a wieght range and a minimum weight", Society for Industrial and Applied Maathematics, Proceedings of the 2005 SIAM International Conference on Data Mining, pp.636-640, 2005.
  8. U. Yun, "Efficient Mining of Weighted Interesting Patterns with A Strong Weight and/or Support Affinity", Information Sciences, Vol. 177, pp. 3477-3499, 2007. https://doi.org/10.1016/j.ins.2007.03.018
  9. U. Yun, "An Efficient Mining of Weighted Frequent Patterns with Length Decreasing Support Constraints", Knowlewdge-Based Systems, Vol. 21, Issue 8, Dec., pp. 741-752, 2008. https://doi.org/10.1016/j.knosys.2008.03.059
  10. S. Zhang, C. Zhang, X. Yan, "Post-Mining: Maintenance of Association Rules by Weighting", Information Systems, Vol. 23, pp. 691-707, 2003.
  11. H. L. Nguyen, "An Efficient Algorithm for Mining Weighted Frequent Itemsets Using Adaptive Weights", I.J. Intellogent Systems and Appillcations, Vol. 11, pp. 41-48, 2015.
  12. C. F. Ahmed, S. K. Tanbeer, B. S. Jeong, "Mining Weighted Frequent Patterns using Adaptive Weightes", In: Fyfe et al. (Eds.): IDEAL 2008, LNCS 5326, pp. 258-265, 2008.
  13. S. W. Jin, B. C. Kim, I. K. Um and Y. I. Kim, "Prototype Development of a Mobile Baseball Pitching Prediction Game using Data Mining Techinque", Journal of Advanced Information Technology and Convergence, Vol. 12, No. 02, pp.135-143, 2014.