DOI QR코드

DOI QR Code

Multi-Sized cumulative Summary Structure Driven Light Weight in Frequent Closed Itemset Mining to Increase High Utility

  • Siva S (Department of Computer Science and Applications, Reva University) ;
  • Shilpa Chaudhari (Department of Computer Science and Engineering, MS Ramaiah Institute of Technology)
  • Received : 2022.10.01
  • Accepted : 2023.03.01
  • Published : 2023.06.30

Abstract

High-utility itemset mining (HIUM) has emerged as a key data-mining paradigm for object-of-interest identification and recommendation systems that serve as frequent itemset identification tools, product or service recommendation systems, etc. Recently, it has gained widespread attention owing to its increasing role in business intelligence, top-N recommendation, and other enterprise solutions. Despite the increasing significance and the inability to provide swift and more accurate predictions, most at-hand solutions, including frequent itemset mining, HUIM, and high average- and fast high-utility itemset mining, are limited to coping with real-time enterprise demands. Moreover, complex computations and high memory exhaustion limit their scalability as enterprise solutions. To address these limitations, this study proposes a model to extract high-utility frequent closed itemsets based on an improved cumulative summary list structure (CSLFC-HUIM) to reduce an optimal set of candidate items in the search space. Moreover, it employs the lift score as the minimum threshold, called the cumulative utility threshold, to prune the search space optimal set of itemsets in a nested-list structure that improves computational time, costs, and memory exhaustion. Simulations over different datasets revealed that the proposed CSLFC-HUIM model outperforms other existing methods, such as closed- and frequent closed-HUIM variants, in terms of execution time and memory consumption, making it suitable for different mined items and allied intelligence of business goals.

Keywords

References

  1. P. Fournier-Viger, J. C. W. Lin, R. U. Kiran, Y. S. Koh, and R. Thomas, "A survey of sequential pattern mining," Data Science and Pattern Recognition, vol. 1, no. 1, pp. 54-77, 2017.
  2. M. J. Zaki, "Scalable algorithms for association mining," IEEE Transactions on Knowledge and Data Engineering, vol. 12, no. 3, pp. 372-390, 2000. DOI: 10.1109/69.846291.
  3. J. Han, J. Pei, and M. Kamber, "Data mining: concepts and techniques," Elsevier, Amsterdam, 2011.
  4. R. Agrawal and R. Srikant, "Mining sequential patterns," in Proceedings of the Eleventh International Conference on Data Engineering, Taipei, Taiwan, pp. 3-14, 1995. DOI: 10.1109/ICDE.1995.380415.
  5. K. K. Sethi and D. Ramesh, "A fast high average-utility itemset mining with efficient tighter upper bounds and novel list structure," The Journal of Supercomputing, vol. 76, no. 12, pp. 10288-10318, Mar. 2020. DOI: 10.1007/s11227-020-03247-5.
  6. R. Agrawal, T. Imielinski, and A. Swami, "Mining association rules between sets of items in large databases," in ACM Sigmod Record, vol. 22, no. 2, pp. 207-216, Jun. 1993. DOI: 10.1145/170035.170072.
  7. R. Agrawal and R. Srikant, "Fast algorithms for mining association rules," in Proceedings of the 20th International Conference on Very Large Data Bases, vol. 1215, pp. 487-499, 1994.
  8. P. Fournier-Viger, L. C. W. Lin, B. Vo, T. T. Chi, J. Zhang, and H. B. Le, "A survey of itemset mining," Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, vol. 7, no. 4, Jul. 2017. DOI: 10.1002/widm.1207.
  9. T. Wei, B. Wang, Y. Zhang, K. Hu, Y. Yao, and H. Liu, "FCHUIM: Efficient Frequent and Closed High-Utility Itemsets Mining," IEEE Access, vol. 8, pp. 109928-109939, 2020. DOI: https://doi.org/10.1109/ACCESS.2020.3001975
  10. G. Grahne and J. Zhu, "Fast algorithms for frequent itemset mining using fp-trees," IEEE Transactions on Knowledge and Data Engineering, vol. 17, no. 10, pp. 1347-1362, Oct. 2005. DOI:10.1109/TKDE.2005.166.
  11. J. Pei, J. Han, H. Lu, S. Nishio, S. Tang, and D. Yang, "H-mine: hyper-structure mining of frequent patterns in large databases," in Proceedings of the 2001 IEEE International Conference on Data Mining, San Jose, USA, pp 441-448, 2001. DOI: 10.1109/ICDM.2001.989550.
  12. V. S. Tseng, B. E. Shie, C. W. Wu, and P. S. Yu, "Efficient algorithms for mining high utility itemsets from transactional databases," IEEE Transactions on Knowledge and Data Engineering, vol. 25, no. 8, pp. 1772-1786, Aug. 2013. DOI: 10.1109/TKDE.2012.59.
  13. R. Chan, Q. Yang, and Y. D. Shen, "Mining high utility itemsets," in Third IEEE International Conference on Data Mining, Melbourne, USA, pp. 19-26, 2003. DOI: 10.1109/ICDM.2003.1250893.
  14. H. Yao, H. J. Hamilton, and C. J. Butz, "A foundational approach to mining itemset utilities from databases," in Proceedings of the 2004 SIAM International Conference on Data Mining, pp. 482-486, Apr. 2004. DOI: 10.1137/1.9781611972740.51.
  15. W. Song, Y. Liu, and J. Li, (2014). "BAHUI: Fast and memory efficient mining of high utility itemsets based on bitmap," International Journal of Data Warehousing and Mining, vol. 10, no. 1, pp. 1-15, 2014. DOI: 10.4018/ijdwm.2014010101.
  16. Y. Liu, W. K. Liao, and A. N. Choudhary, "A two-phase algorithm for fast discovery of high utility itemsets," in Pacific-Asia Conference on Knowledge Discovery and Data Mining, Hanoi, Vietnam, pp. 689-695, 2005. DOI: 10.1007/11430919_79.
  17. Y. C. Li, J. S. Yeh, and C. C. Chang, "Isolated items discarding strategy for discovering high utility itemsets," Data and Knowledge Engineering, vol. 64, no. 1, pp. 198-217, Jan. 2008. DOI: 10.1016/j.datak.2007.06.009.
  18. C. F. Ahmed, S. K. Tanbeer, B. S. Jeong, and Y. K. Lee, "Efficient tree structures for high utility pattern mining in incremental databases," IEEE Transactions on Knowledge and Data Engineering, vol. 21, no. 12, pp. 1708-1721, Dec. 2009. DOI: 10.1109/TKDE.2009.46.
  19. V. S. Tseng, C. W. Wu, B. E. Shie, and P. S. Yu, "UP-growth: an efficient algorithm for high utility itemset mining," in Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington DC, USA pp. 253-262, 2010. DOI: 10.1145/1835804.1835839.
  20. M. Liu and J. Qu, "Mining high utility itemsets without candidate generation," in Proceedings of the 21st ACM International Conference on Information and Knowledge Management, Maui Hawaii, USA, pp. 55-64, 2012. DOI: 10.1145/2396761.2396773.
  21. P. Fournier-Viger, C. W. Wu, S. Zida, and V. S. Tseng, "FHM: faster high-utility itemset mining using estimated utility co-occurrence pruning," in International Symposium on Methodologies for Intelligent Systems, Cham, vol. 8502, pp. 83-92, 2014. DOI: 10.1007/978-3-319-08326-1_9.
  22. T. P. Hong, C. H. Lee, and S. L. Wang, "Effective utility mining with the measure of average utility," Expert Systems with Applications, vol. 38, no. 7, pp. 8259-8265, Jul. 2011. DOI: 10.1016/j.eswa.2011.01.006.
  23. G. C. Lan, T. P. Hong, and V. S. Tseng, "A projection-based approach for discovering high average utility itemsets," Journal of Information Science and Engineering, vol. 28, no. 1, pp. 193-209, 2012.
  24. C. W. Lin, T. P. Hong, and W. H. Lu, "Efficiently mining high average utility itemsets with a tree structure," in Asian Conference on Intelligent Information and Database Systems, Hue City, Vietnam, pp. 131-139, 2010. DOI: 10.1007/978-3-642-12145-6_14.
  25. A. Y. Peng, Y. S. Koh, and P. Riddle, "mHUIMiner: A fast high utility itemset mining algorithm for sparse datasets," in Advances in Knowledge Discovery and Data Mining, Jeju, South Korea, pp. 196-207, 2017. DOI: 10.1007/978-3-319-57529-2_16.
  26. J. Pei, J. Han, and L. V. Lakshmanan, "Pushing convertible constraints in frequent itemset mining," Data Mining and Knowledge DiscoveryM vol. 8, no. 3, pp. 227-252, May 2004. DOI: 10.1023/B:DAMI.0000023674.74932.4c.
  27. K. K. Sethi and D. Ramesh, "HFIM: a Spark-based hybrid frequent itemset mining algorithm for big data processing," The Journal of Supercomputing, vol. 73, no. 8, pp. 3652-3668, Jan. 2017. DOI:10.1007/s11227-017-1963-4.
  28. G. Pyun, U. Yun, and K. H. Ryu, "Efficient frequent pattern mining based on linear prefix tree," Knowledge-Based Systems, vol. 55, pp. 125-139, Jan. 2014. DOI: 10.1016/j.knosys.2013.10.013.
  29. U. Yun, G. Lee, and K.H. Ryu, "Mining maximal frequent patterns by considering weight conditions over data streams," KnowledgeBased Systems, vol. 55, pp. 49-65, Jan. 2014. DOI: 10.1016/j.knosys.2013.10.011.
  30. C. W. Lin, T. P. Hong, and W. H. Lu, "An effective tree structure for mining high utility itemsets," Expert Systems with Applications, vol. 38, no. 6, pp. 7419-7424, Jun. 2011. DOI: 10.1016/j.eswa.2010.12.082.
  31. V. S. Tseng, B. E. Shie, C. W. Wu, and P. S. Yu, "Efficient algorithms for mining high utility itemsets from transactional databases," IEEE Transactions on Knowledge and Data Engineering, vol. 25, no. 8, pp. 1772-1786, Aug. 2013. DOI: 10.1109/TKDE.2012.59.
  32. U. Yun, H. Ryang, and K. H. Ryu, "High utility itemset mining with techniques for reducing overestimated utilities and pruning candidates," Expert Systems with Applications, vol. 41, no. 8, pp.3861-3878, Jun. 2014. DOI: 10.1016/j.eswa.2013.11.038.
  33. G. C. Lan, T. P. Hong, and V. S. Tseng, "An efficient projectionbased indexing approach for mining high utility itemsets," Knowledge and Information Systems, vol. 38, no. 1 pp. 85-107, Jan. 2014. DOI: 10.1007/s10115-012-0492-y.
  34. S. Krishnamoorthy, "Pruning strategies for mining high utility itemsets," Expert Systems with Applications, vol. 42, no. 5, pp. 2371-2381, Apr. 2015. DOI: 10.1016/j.eswa.2014.11.001.
  35. S. Zida, P. Fournier-Viger, J. C. W. Lin, C. W. Wu, and V. S. Tseng, "EFIM: a highly efficient algorithm for high-utility itemset mining," in Mexican International Conference on Artificial Intelligence, Cuernavaca, Mexico, pp. 530-546, 2015. DOI: 10.1007/978-3-319-27060-9_44.
  36. S. Krishnamoorthy, "HMiner: efficiently mining high utility itemsets," Expert Systems with Application, vol. 90, pp. 168-183, Dec. 2017. DOI: 10.1016/j.eswa.2017.08.028.
  37. W. Song, Y. Liu, and J. Li, "BAHUI: fast and memory efficient mining of high utility itemsets based on bitmap," International Journal of Data Warehousing and Mining, vol. 10, no. 1, pp. 1-15, Jan. 2014. DOI: 10.4018/ijdwm.2014010101.
  38. J. C. W. Lin, L. Yang, P. Fournier-Viger, J. M. T. Wu, T. P. Hong, L. S. L. Wang, and J. Zhan, "Mining high utility itemsets based on particle swarm optimization," Engineering Applications of Artificial Intelligence, vol. 55, pp. 320-330, Oct. 2016. DOI: 10.1016/j.engappai.2016.07.006.
  39. P. Fournier-Viger, J. C. W. Lin, C. W. Wu, V. S. Tseng, and U. Faghihi, "Mining minimal high-utility itemsets," in International Conference on Database and Expert Systems Applications, Porto, Portugal,, pp. 88-101, 2016. DOI: 10.1007/978-3-319-44403-1_6.
  40. P. Fournier-Viger, J. C. W. Lin, Q. H. Duong, and T. L. Dam, "FHM +: faster high-utility itemset mining using length upper-bound reduction," in International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems, Morioka, Japan,, pp. 115-127, 2016. DOI: 10.1007/978-3-319-42007-3_11.
  41. T. Lu, B. Vo, H. T. Nguyen, and T. P. Hong, "A new method for mining high average utility itemsets," in IFIP International Conference on Computer Information Systems and Industrial Management, Ho Chi Minh City, Vietnam, pp. 33-42, 2014. DOI:10.1007/978-3-662-45237-0_5.
  42. C. W. Lin, T. P. Hong, and W. H. Lu, "Efficiently mining high average utility itemsets with a tree structure," in Asian Conference on Intelligent Information and Database Systems, Hue City, Vietnam, pp. 131-139, 2010. DOI: 10.1007/978-3-642-12145-6_14.
  43. J. C. W. Lin, T. Li, P. Fournier-Viger, T. P. Hong, J. Zhan, and M. Voznak, "An efficient algorithm to mine high average-utility itemsets," Advanced Engineering Informatics, vol. 30, no. 2, pp. 233-243, Apr. 2016. DOI: 10.1016/j.aei.2016.04.002.
  44. J. C. W. Lin, S. Ren, P. Fournier-Viger, and T. P. Hong, "EHAUPM: efficient high average-utility pattern mining with tighter upper bounds," IEEE Access, vol. 5, pp. 12927-12940, 2017. DOI:10.1109/ACCESS.2017.2717438.
  45. U. Yun and D. Kim, "Mining of high average-utility itemsets using novel list structure and pruning strategy," Future Generation Computer Systems, vol. 68, pp. 346-360, Mar. 2017. DOI: 10.1016/j.future.2016.10.027.
  46. J. C. W. Lin, S. Ren, T. P. Fournier-Viger, T. P. Hong, J. H. Su, and B. Vo, "A fast algorithm for mining high average-utility itemsets," Applied Intelligence, vol. 47, no. 2, pp. 331-346. Sep. 2017. DOI:10.1007/s10489-017-0896-1.
  47. J. C. W. Lin, S. Ren, and P. Fournier-Viger, "MEMU: more efficient algorithm to mine high average utility patterns with multiple minimum average-utility thresholds," IEEE Access, vol. 6, pp. 7593-7609, 2018. DOI: 10.1109/ACCESS.2018.2801261.
  48. J. M. T. Wu, J. C. W. Lin, M. Pirouz, and P. Fournier-Viger, "TUBHAUPM: tighter upper bound for mining high average-utility patterns," IEEE Access, vol. 6, pp. 18655-18669. DOI: 10.1109/ACCESS.2018.2820740.
  49. T. Truong, H. Duong, B. Le, and P. Fournier-Viger, "Efficient vertical mining of high average-utility itemsets based on novel upper-bounds," IEEE Transactions on Knowledge and Data Engineering, vol. 31, no. 2, pp. 301-314, Feb. 2018. DOI: 10.1109/TKDE.2018.2833478.
  50. T. Truong, H. Duong, B. Le, P. Fournier-Viger, and U. Yun, "Efficient high average-utility itemset mining using novel vertical weak upper-bounds," Knowledge-Based Systems, vol. 183, p.104847, Nov. 2019. DOI: 10.1016/j.knosys.2019.07.018.
  51. V. S. Tseng, C. W. Wu, P. Fournier-Viger, and P. S. Yu, "Efficient algorithms for mining the concise and lossless representation of high utility itemsets," IEEE Transactions on Knowledge and Data Engineering, vol. 27, no. 3, pp. 726-739, Mar. 2015. DOI: 10.1109/TKDE.2014.2345377.
  52. C. W. Wu, P. Fournier-Viger, J. Y. Gu, and V. S. Tseng, "Mining closed+ high utility itemsets without candidate generation," in Proceedings of the 2015 Conference on Technologies and Applications of Artificial. Intelligence, Tainan, Taiwan, pp. 187-194, 2015. DOI: 10.1109/TAAI.2015.7407089.
  53. P. Fournier-Viger, S. Zida, W. J. C, Lin, C. W. Wu, and V. S. Tseng, "EFIM-closed: Fast and memory efficient discovery of closed highutility itemsets," in Machine Learning and Data Mining in Pattern Recognition, New York, USA, pp. 199-213, 2016. DOI: 10.1007/978-3-319-41920-6_15.
  54. T.-L. Dam, K. Li, P. Fournier-Viger, and Q. H. Duong, "CLS-Miner: Efficient and effective closed high-utility itemset mining," Frontiers of Computer Science, vol. 13, no. 2, pp. 357-381, 2019. DOI:10.1007/s11704-016-6245-4.
  55. SPMF: Java open-source data mining library [Internet], Available: http://www.philippe-fournier-viger.com/spmf/.
  56. C. W. Wu, P. Fournier-Viger, J. Y. Gu, and V. S. Tseng, "Mining closed+ high utility itemsets without candidate generation," in Proceedings of the Conference on Technologies. and Applications of Artificial. Intelligence, Tainan, Taiwan, pp. 187-194, 2015. DOI:10.1109/TAAI.2015.7407089.
  57. T. L. Dam, K. Li, P. Fournier-Viger, and Q. H. Duong, "CLS-Miner: Efficient and effective closed high-utility itemset mining," Frontiers of Computer Science, vol. 13, no. 2, pp. 357-381, Apr. 2019. DOI:10.1007/s11704-016-6245-4.
  58. K. K. Sethi and D. Ramesh D., "A fast high average-utility itemset mining with efficient tighter upper bounds and novel list structure," The Journal of Supercomputing, vol. 76, no. 12, pp. 10288-10318, Mar. 2020. DOI: 10.1007/s11227-020-03247-5.
  59. T. Wei, B. Wang, Y. Zhang, K. Hu, Y. Yao, and H. Liu, "FCHUIM: Efficient Frequent and Closed High-Utility Itemsets Mining," IEEE Access, pp. 109928-109939, 2020. DOI: 10.1109/ACCESS.2020.3001975.