[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.4218/etrij.2020-0300

A single-phase algorithm for mining high utility itemsets using compressed tree structures

Bhat B, Anup (Department of Computer Science and Engineering, Manipal Institute of Technology, Manipal Academy of Higher Education)
SV, Harish (Department of Computer Science and Engineering, Manipal Institute of Technology, Manipal Academy of Higher Education)
M, Geetha (Department of Computer Science and Engineering, Manipal Institute of Technology, Manipal Academy of Higher Education)

Publication Information

ETRI Journal / v.43, no.6, 2021 , pp. 1024-1037 More about this Journal

Abstract

Mining high utility itemsets (HUIs) from transaction databases considers such factors as the unit profit and quantity of purchased items. Two-phase tree-based algorithms transform a database into compressed tree structures and generate candidate patterns through a recursive pattern-growth procedure. This procedure requires a lot of memory and time to construct conditional pattern trees. To address this issue, this study employs two compressed tree structures, namely, Utility Count Tree and String Utility Tree, to enumerate valid patterns and thus promote fast utility computation. Furthermore, the study presents an algorithm called single-phase utility computation (SPUC) that leverages these two tree structures to mine HUIs in a single phase by incorporating novel pruning strategies. Experiments conducted on both real and synthetic datasets demonstrate the superior performance of SPUC compared with IHUP, UP-Growth, and UP-Growth+algorithms.

Keywords

data mining; high utility itemsets; utility mining;

Citations & Related Records

Reference

1	H. Yao and H. J. Hamilton. Mining itemset utilities from transaction databases, Data Knowl. Eng. 59 (2006), no. 3, 603-626. DOI
2	W. Zhang et al., Text clustering using frequent itemsets, Knowl-Based Syst. 23 (2010), no. 5, 379-388. DOI
3	Y. Liu and W.-K. Liao, A fast high utility itemsets mining algorithm, in Proc. Int. Workshop Utility-Based Data Min. (New York, NY, USA), Aug. 2005, pp. 90-99.
4	R. Agrawal and R. Srikant, Fast algorithms for mining association rules, in Proc. Int. Conf. Very Large Data Bases (Santiago, Chile), Sept. 1994, 487-499.
5	J. Han et al., Frequent pattern mining: Current status and future directions, Data Min. Knowl. Disc. 15 (2007), no. 1, 55-86. DOI
6	S. Krishnamoorthy, Pruning strategies for mining high utility itemsets, Expert Syst. Appl. 42 (2015), no. 5, 2371-2381. DOI
7	V. S. Ananthanarayana, D. K. Subramanian, and M. N. Murty, Scalable, distributed and dynamic mining of association rules, in High Performance Computing-HiPC 2000, vol. 1970, Springer, Berlin, Heidelberg, Germany, 2000, pp. 559-566.
8	M. Geetha and R. J. D'souza, An efficient reduced pattern count tree method for discovering most accurate set of frequent itemsets, Int. J. Comp. Sci. Netw. Sec. 8 (2008), no. 8, 121-126.
9	P. Fournier-Viger, SPMF An Open-Source Data Mining Library, Developer's Guide, 2020, available at https://www.philippe-fournier-viger.com/spmf/index.php?link=developers.php
10	P. Fournier-Viger, SPMF An Open-Source Data Mining Library, Datasets, 2020. available at https://www.philippe-fournier-viger.com/spmf/index.php?link=datasets.php
11	C. Zhang et al., An empirical evaluation of high utility itemset mining algorithms, Expert Syst. Appl. 101 (2018), 91-115. DOI
12	S. Zida et al., Efim: A fast and memory efficient algorithm for high-utility itemset mining, Knowl. Inf. Syst. 51 (2017), no. 2, 595-625. DOI
13	J. Liu, K. E. Wang, and B. C. M. Fung, Direct discovery of high utility itemsets without candidate generation, in Proc. IEEE Int. Conf. Data Min. (Brussels, Belgium), Dec. 2012, pp. 984-989.
14	J. Liu, K. Wang, and B. C. M. Fung, Mining high utility patterns in one phase without generating candidates, IEEE Trans. Knowl. Data Eng. 28 (2016), no. 5, 1245-1257. DOI
15	S. Dawar, D. Bera, and V. Goyal, High-utility itemset mining for subadditive monotone utility functions, arXiv preprint, CoRR, 2018, arXiv:1812.07208.
16	S. Naulaerts, et al., A primer to frequent itemset mining for bioinformatics, Brief Bioinform. 16 (2015), 216-231. DOI
17	R. Harpaz, H. S. Chase, and C. Friedman, Mining multi-item drug adverse effect associations in spontaneous reporting systems, BMC Bioinform. 11 (2010), no. 9, S7.
18	H. Yao, H. J. Hamilton, and C. J. Butz, A foundational approach to mining itemset utilities from databases, in Proc. SIAM Int. Conf. Data Min. (Lake Buena Vista, FL, USA), Apr. 2004, pp. 482-486.
19	Y. Liu, W.-K. Liao, and A. Choudhary, A two-phase algorithm for fast discovery of high utility itemsets, in Advances in Knowledge Discovery and Data Mining, Springer, Berlin, Heidelberg, Germany, 2005, pp. 689-695.
20	Y. Liu et al., High utility itemsets mining, Int. J. Inf. Tech. Decis. Making 9 (2010), no. 6, 905-934. DOI
21	P. Fournier-Viger et al., Fhm: Faster high-utility itemset mining using estimated utility co-occurrence pruning, in International Symposium on Methodologies for Intelligent Systems, Springer, Berlin, Heidelberg, Germany, 2014, pp. 83-92.
22	C. W. Lin, T. P. Hong, and W. H. Lu, An effective tree structure for mining high utility itemsets, Expert Syst. Appl. 38 (2011), no. 6, 7419-7424. DOI
23	C. F. Ahmed et al., HUC-Prune: An efficient candidate pruning technique to mine high utility patterns, Appl. Intell. 34 (2011), no. 2, 181-198. DOI
24	C. F. Ahmed et al., Efficient tree structures for high utility pattern mining in incremental databases, IEEE Trans. Knowl. Data Eng. 21 (2009), no. 12, 1708-1721. DOI
25	V. S. Tseng et al., UP-Growth: An efficient algorithm for high utility itemset mining, Discov. Data Min. (New York, NY, USA), July (2010), 253-262.
26	V. S. Tseng et al., Efficient algorithms for mining high utility itemsets from transactional databases, IEEE Trans. Knowl. Data Eng. 28 (2016), no 1, 54-67. DOI
27	J. Han et al., Mining frequent patterns without candidate generation: A frequent-pattern tree approach, Data Min. Knowl. Disc. 8 (2004), no. 1, 53-87. DOI
28	M Liu and J Qu, Mining high utility itemsets without candidate generation, in Proc. ACM Int. Conf. Inform. Knowl. Manag. (New York, NY, USA), Oct. 2012, pp. 55-64.