Browse > Article
http://dx.doi.org/10.16981/kliss.48.201709.233

Research on Minimizing Access to RDF Triple Store for Efficiency in Constructing Massive Bibliographic Linked Data  

Lee, Moon-Ho (경기대학교 대학원 문헌정보학과)
Choi, Sung-Pil (경기대학교 문헌정보학과)
Publication Information
Journal of Korean Library and Information Science Society / v.48, no.3, 2017 , pp. 233-257 More about this Journal
Abstract
In this paper, we propose an effective method to convert and construct the MEDLINE, the world's largest biomedical bibliographic database, into linked data. To do this, we first derive the appropriate RDF schema by analyzing the MEDLINE record structure in detail, and convert each record into a valid RDF file in the derived schema. We apply the dual batch registration method to streamline the subject URI duplication checking procedure when merging all RDF files in the converted record unit and storing it in a single RDF triple storage. By applying this method, the number of RDF triple storage accesses for the subject URI duplication is reduced from 26,597,850 to 2,400, compared with the sequential configuration of linked data in units of RDF files. Therefore, it is expected that the result of this study will provide an important opportunity to eliminate the inefficiency in converting large volume bibliographic record sets into linked data, and to secure promptness and timeliness.
Keywords
MEDLINE; Linked data; RDF Schema; RDF triple store; Dual batch registration;
Citations & Related Records
Times Cited By KSCI : 3  (Citation Analysis)
연도 인용수 순위
1 National Information Society Agency. 2014. 2014 domestic case study of linked open data construction. Seoul: Nation Information Society Agency.
2 Mun Hyeon Jeong, Sung Jung Hwan, Kim Young Ji and Woo Yong Tae. 2007. "A Design and Implementation of Efficient Storage Structure for a Large RDF Data Processing." The Jounal of Society for e-Business Studies, 12(3): 251-268.
3 Jun-Won Jung, Ho-Young Jung, Jong-Nam Kim, Dong-Hyuk Lim, Hyoung-Joo Kim. 2005. "A RDF based Ontology Management System." Journal of KIISE : Computing Practices and Letters, 11(4): 381-392.
4 MyungJoong Jeon, JinYoung Hong and YoungTack Park. 2016. "SPARQL Query Processing System over Scalable Triple Data using SparkSQL Framework." Journal of KIISE, 43(4): 450-459.   DOI
5 Cheon Jung Kim, Ki Yeon Kim, Jong Hyeon Yoon, Jong Tae Lim, Kyoung Soo Bok, Jae Soo Yoo. 2014. "A Dynamic Partitioning Scheme for Distributed Storage of Large-Scale RDF Data." Journal of KIISE, 41(12): 1126-1135.   DOI
6 Berners-Lee, Tim. 2006. Linked Data, [citied 2017. 8. 7].
7 NIH. 2017. Fact Sheet MEDLINE, PubMed, and PMC(PubMed Central): How are they different?, [cited 2017. 8. 7].
8 Oliver E, Bhalotia G, Schwartz AS, Altman RB, Hearst MA. 2004. "Tools for loading MEDLINE into a local relational database." BMC Bioinformatics, 5(1): 146.   DOI
9 Zhiyong Lu. 2011. PubMed and beyond: a survey of web tools for searching biomedical literature. Database, 2011.
10 Chen, B., Ding, Y., Wang, H., Wild, D. J., Dong, X., Sun, Y., & Sankaranarayanan, M. 2010. "Chem2bio2rdf: A Linked Open Data Portal for Systems Chemical Biology." In Web Intelligence and Intelligent Agent Technology (WI-IAT), 1: 232-239.
11 Kilicoglu, H., Fiszman, M., Rodriguez, A., Shin, D., Ripple, A., & Rindflesch, T. C. 2008. Semantic MEDLINE: a web application for managing the results of PubMed Searches, in: Proc. 3rd International Symposium in Semantic Mining in Biomedicine, European Bioinformatics Institute, Hinxton, 2008: 69-76.
12 Lin, J., 2009. "Is searching full text more effective than searching abstracts?." BMC bioinformatics, 10(1): 46.   DOI
13 Castro, L.J.G., McLaughlin, C. and Garcia, A., 2013. "Biotea: RDFizing PubMed Central in support for the paper as an interface to the Web of Data." Journal of biomedical semantics, 4(1): S5.   DOI