[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.3745/JIPS.04.0205

Query Optimization on Large Scale Nested Data with Service Tree and Frequent Trajectory

Wang, Li (Basic Courses Department, Shanghai Institute of Tourism)
Wang, Guodong (Teaching Affairs Office, Shanghai Institute of Tourism)

Publication Information

Journal of Information Processing Systems / v.17, no.1, 2021 , pp. 37-50 More about this Journal

Abstract

Query applications based on nested data, the most commonly used form of data representation on the web, especially precise query, is becoming more extensively used. MapReduce, a distributed architecture with parallel computing power, provides a good solution for big data processing. However, in practical application, query requests are usually concurrent, which causes bottlenecks in server processing. To solve this problem, this paper first combines a column storage structure and an inverted index to build index for nested data on MapReduce. On this basis, this paper puts forward an optimization strategy which combines query execution service tree and frequent sub-query trajectory to reduce the response time of frequent queries and further improve the efficiency of multi-user concurrent queries on large scale nested data. Experiments show that this method greatly improves the efficiency of nested data query.

Keywords

Caching; Frequent Sub-query Trajectory; Nested Data; Query Optimization; Service Tree;

Citations & Related Records

Reference

1	S. Hido and H. Kawano, "AMIOT: induced ordered tree mining in tree-structured databases," in Proceedings of the 5th IEEE International Conference on Data Mining (ICDM), Houston, TX, 2005, pp. 170-177.
2	F. Luccio, A. Mesa Enriquez, P. Olivares Rieumont, and L. Pagli, "Bottom-up subtree isomorphism for unordered labeled trees," Dipartimento di Informatica, Universita di Pisa, Italy, 2004.
3	Y. F. Yang, D. Y. Wang, and Y. J. Hu, "Positive and negative association rule mining on XML data streams in database as a service concept," Manufacturing Automation, vol. 34, no. 10, pp. 109-112, 2012. DOI
4	A. Patrizio, "IDC: Expect 175 zettabytes of data worldwide by 2025," 2018 [Online]. Available: https://www.networkworld.com/article/3325397/idc-expect-175-zettabytes-of-data-worldwide-by2025.html.
5	Y. J. Fan, C. H. Zhang, S. Y. Wang, and Y. F. Hu, "IRST(k,l)-Index: an efficient XML structural index for branching path queries," Journal of Chinese Computer Systems, vol. 30, no. 8, pp. 1546-1554, 2009.
6	S. Melnik, A. Gubarev, J. J. Long, G. Romer, S. Shivakumar, M. Tolton, and T. Vassilakis, "Dremel: interactive analysis of web-scale datasets," Proceedings of the VLDB Endowment, vol. 3, no. 1-2, pp. 330-339, 2010. DOI
7	L. Wang, D. Peng, and P. Jiang, "Improving the performance of precise query processing on large-scale nested data with UniHash index," International Journal of Database Theory and Application, vol. 8, pp. 111-128, 2015.
8	J. Ning, J. Liu, and D. Ye, "Novel approach for extracting XML schema definition based on content model graph," Computer Science, vol. 37, no. 6, pp. 179-185, 2010. DOI
9	Y. Lu, W. Wang, J. Li, and C. Liu, "XClean: providing valid spelling suggestions for XML keyword queries," in Proceedings of 2011 IEEE 27th International Conference on Data Engineering, Hannover, Germany, 2011, pp. 661-672.
10	Z. Y. Qin, Y. Tang, H. Z. Xu, and U. Huang, "Study on keyword retrieval based on keyword density for XML data," Journal of Software, vol. 30, no. 4, pp. 1062-1077, 2019.
11	D. P. Wei and D. Luo, "An XML keyword query algorithm based on interval reserved coding," Computer and Modernization, vol. 2019, no. 10, pp. 17-20, 2019.
12	B. Kimelfeld and Y. Sagiv, "Matching twigs in probabilistic XML," in Proceedings of the 33rd International Conference on Very Large Data Bases, Vienna, Austrai, 2017, pp. 27-38.
13	D. Li, Z. Deng, and Z. Li, "Structural join processing for XML based on MapReduce," Journal of Frontiers of Computer Science & Technology, vol. 10, no. 8, pp. 1080-1091, 2016.
14	A. V. Nori, J. Gaur, S. Rai, S. Subramoney, and H. Wang, "Criticality aware tiered cache hierarchy: a fundamental relook at multi-level cache hierarchies," in Proceedings of 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA), Los Angeles, CA, 2018, pp. 96-109.
15	S. Rosnan, N. Abd Rahman, S. M. Hatim, and Z. H. Ghul, "Performance evaluation of inverted files, B-Tree and B+ Tree indexing algorithm on Malay text," in Proceedings of 2019 4th International Conference and Workshops on Recent Advances and Innovations in Engineering (ICRAIE), Kedah, Malaysia, 2019, pp. 1-6.
16	A. Bandura and O. Skaskiv, "Functions analytic in a unit ball of bounded L-index in joint variables," Journal of Mathematical Sciences, vol. 227, no. 1, pp. 1-12, 2017. DOI
17	C. Ma, H. Xu, B. Yao, L. Wang, and H. Zhu, "XML temporal query technology based on CB+-tree index," Journal of Chongqing University of Science and Technology (Natural Science Edition), vol. 2016, no. 5, pp. 75-77, 2016.
18	R. Tandon, "The capacity of cache aided private information retrieval," in Proceedings of 2017 55th Annual Allerton Conference on Communication, Control, and Computing (Allerton), Monticello, IL, 2017, pp. 1078-1082.
19	X. L. Qin, W. B. Zhang, O. J. Wei, W. Wang, H. Zhong, and T. Huang, "Progress and challenges of distributed caching techniques in cloud computing," Journal of Software, vol. 24, no. 1, pp. 50-66, 2013. DOI
20	M. S. A. Khaleel, S. E. F. Osman, and H. A. N. Sirour, "Proposed ALFUR using intelegent agent comparing with LFU, LRU, size and PCCIA cache replacement techniques," in Proceedings of 2017 International Conference on Communication, Control, Computing and Electronics Engineering (ICCCCEE), Khartoum, Sudan, 2017, pp. 1-6.
21	C. Zhao, Z. Sun, and J. Zhang, "Frequent subtree mining based on projected branch," Journal of Computer Research and Development, vol. 43, no. 3, pp. 456-462, 2006. DOI
22	P. Boonma, J. Natwichai, K. Khwanngern, and P. Nantawad, "DAHS: a distributed data-as-a-service framework for data analytics in healthcare," in Advances on P2P, Parallel, Grid, Cloud and Internet Computing. Cham, Switzerland: Springer, 2018, pp. 486-495.
23	D. Jiang and L. Li, "Frequent itemset mining algorithm based on UFP-tree," Computer Technology and Development, vol. 2019, no. 10, pp. 175-180, 2019.