Browse > Article
http://dx.doi.org/10.6109/jkiice.2022.26.4.487

Compression Conversion and Storing of Large RDF datasets based on MapReduce  

Kim, InA (Department of Computer Engineering, Chungnam National University)
Lee, Kyong-Ha (Korea Institute of Science and Technology Information)
Lee, Kyu-Chul (Department of Computer Engineering, Chungnam National University)
Abstract
With the recent demand for analysis using data, the size of the knowledge graph, which is the data to be analyzed, gradually increased, reaching about 82 billion edges when extracted from the web as a knowledge graph. A lot of knowledge graphs are represented in the form of Resource Description Framework (RDF), which is a standard of W3C for representing metadata for web resources. Because of the characteristics of RDF, existing RDF storages have the limitations of processing time overhead when converting and storing large amounts of RDF data. To resolve these limitations, in this paper, we propose a method of compressing and converting large amounts of RDF data into integer IDs using MapReduce, and vertically partitioning and storing them. Our proposed method demonstrated a high performance improvement of up to 25.2 times compared to RDF-3X and up to 3.7 times compared to H2RDF+.
Keywords
MapReduce; RDF; Big data; Compression and conversion; Vertical partitioning;
Citations & Related Records
연도 인용수 순위
  • Reference
1 University of Waterloo, Waterloo SPARQL Diversity Test Suite (WatDiv) v0.6 [Internet], Available: https://dsg.uwaterloo.ca/watdiv/
2 Web Data Commons, Microdata, RDFa, JSON-LD, and Microformat Data Set [Internet], Available: http://webdatacommons.org/structureddata/index.html#results-2021-1
3 W. Ali, M. Saleem, B. Yao, A. Hogan, and A. -C. N. Ngomo, "A survey of RDF stores & SPARQL engines for querying knowledge graphs," The VLDB Journal, pp. 1-26, Nov. 2021.
4 T. Keumann and G. Weikum, "RDF-3X: a RISC-style engine for RDF," in Proceedings of VLDB Endowment, Auckland, New Zealand, vol. 1, iss, 1, pp. 647-659, Aug. 2008.
5 K. Lee, L. and Liu, "Scaling queries over big RDF graphs with semantic hash partitioning," in Proceedings of the VLDB Endowment, Trento, Italy, vol. 6, no. 14, pp. 1894-1905, 2013.   DOI
6 W3C, Resource description framf theework (rdf) model and syntax specification [Internet], Available: https://www.w3.org/TR/1998/WD-rdf-syntax-19980819/
7 F. Goasdoue, Z. Kaoudi, I. Manolescu, J. -A. Quiane-Ruiz, and S. Zampetakis, "CliqueSquare: Flat plans for massively parallel RDF queries," in 2015 IEEE 31st International Conference on Data Engineering, Seoul, South Korea, pp. 771-782, 2015.
8 I. A. Kim and K. -C. Lee, "Conversion of Large RDF Data using Hash-based ID Mapping Tables," in Proceedings of the Korean Institute of Information and Commucation Sciences Conference, Gunsan South Korea, pp. 236-239, 2021.
9 M. Wylot, M. Hauswkrth, P. Cudre-Mauroux, and S. Sakr, "RDF data storage and query processing schemes: A survey," ACM Computing Surveys (CSUR), vol. 51, no. 4, pp. 1-36, 2018.
10 K. L. Bawankule, Q. K. Dewang, and A. K. Singh, "Historical data based approach to mitigate stragglers from the Reduce phase of MapReduce in a heterogeneous Hadoop cluster," Cluster Computing, pp. 1-19, Feb. 2022.
11 N. Papailiou, D. Tsoumakos, I. Konstantinou, P. Karras, and N. Koziris, "H2rdf+ an efficient data management system for big rdf graphs," in Proceedings of the 2014 ACM SIGMOD international conference on Management of data, Utah, USA, pp. 909-912, Jun. 2014.
12 W3C, RDF 1.1 N-Triples [Internet], Available: https://www.w3.org/TR/n-triples/
13 W3C, RDF 1.1 Turtle [Internet], Available: https://www.w3.org/TR/turtle/
14 SWAT, The Lehigh University Benchmark (LUBM) [Internet], Available: http://swat.cse.lehigh.edu/projects/lubm/
15 Max-Planck-Institute Saarbrucken, YAGO: A High-Quality Knowledge Base [Internet], Available: https://yago-knowledge.org/
16 B. B. Mahria, I. Chaker, and , A. Zahi, "An empirical study on the evaluation of the RDF storage systems," Journal of Big Data, vol. 8, no. 1, pp. 1-20, 2021.   DOI
17 J. Dean and S. Ghemawat, "MapReduce: simplified data processing on large clusters," Communications of the ACM, vol. 51, no. 1, pp. 107-113, Jan. 2008.   DOI