Browse > Article
http://dx.doi.org/10.5909/JBE.2022.27.4.581

Dataset Search System Using Metadata-Based Ranking Algorithm  

Choi, Wooyoung (School of Software Convergence, College of ICT Convergence, Myongji University)
Chun, Jonghoon (School of Software Convergence, College of ICT Convergence, Myongji University)
Publication Information
Journal of Broadcast Engineering / v.27, no.4, 2022 , pp. 581-592 More about this Journal
Abstract
Recently, as the requirements for using big data have increased, interest in dataset search technology needed for data analysis is also growing. Although it is necessary to proactively utilize metadata, unlike conventional text search, research on such dataset search systems has not been actively carried out. In this paper, we propose a new dataset-tailored search system that indexes metadata of datasets and performs dataset search based on metadata indices. The ranking given to the dataset search results from a newly devised algorithm that reflects the unique characteristics of the dataset. The system provides the capability to search for additional datasets which correlate with the dataset searched by the user-submitted query so that multiple datasets needed for analysis can be found at once.
Keywords
Dataset; Search; Metadata; Ranking; Big data;
Citations & Related Records
Times Cited By KSCI : 1  (Citation Analysis)
연도 인용수 순위
1 Data Catalog Vocabulary (DCAT) - Version 2, https://www.w3.org/TR/vocab-dcat-2/ (accessed Feb. 04, 2020).
2 S. Neumaier, J. Umbrich, A. Polleres, "Automated quality assessment of metadata across open data portals," Journal of Data and Information Quality, Vol. 8, No. 1, pp. 1-29 Oct. 2016. doi: https://doi.org/10.1145/2964909   DOI
3 Elasticsearch, https://www.elastic.co/kr/ (accessed Mar. 04, 2020).
4 Beautiful Soup documentation, https://www.crummy.com/software/BeautifulSoup/bs4/doc/ (accessed Dec. 12, 2021).
5 Selenium, https://www.selenium.dev/ (accessed Jan. 15, 2022).
6 HTML Microdata, https://www.w3.org/TR/2021/NOTE-microdata-20210128/(accessed Feb. 23, 2022).
7 Mongoosastic, https://mongoosastic.github.io/mongoosastic/ (accessed Mar. 03, 2022).
8 S. Sansone, A. Gonzalez-Beltran, P. Rocca-Serra, G. Alter, J. Grethe, H. Xu, I. Fore, J. Lyle, A. Gururaj, X. Chen, H. Kim, N. Zong, Y. Li, R. Liu, I. Burak Ozyurt, and L. Ohno-Machado, "Dats, the data tag suite to enable discoverability of datasets," Scientific data, Vol. 4, No. 1, pp. 1-8, June 2017. doi: https://doi.org/10.1038/sdata.2017.59   DOI
9 Schema.org https://schema.org/ (accessed Mar. 17, 2022).
10 A. Chapman, E. Simperl, L. Koesten, G. Konstantinidis, L. Ibanez, E. Kacprzak, and P. Groth, "Dataset search: a survey," The VLDB Journal, Vol. 9, No.1, pp. 251-272, Jan. 2020. doi: https://doi.org/10.1007/s00778-019-00564-x   DOI
11 Data.gov, https://data.gov/ (accessed July 5, 2022).
12 European data portal, https://data.europa.eu/en (accessed June 15, 2022).
13 Linked open data cloud, https://lod-cloud.net/ (accessed Mar. 28, 2022).
14 CKAN -The open source data management system, https://ckan.org/ (accessed Mar. 27, 2022).
15 R. Miller, "Open Data Integration," Proceedings of the VLDB Endowment, Vol. 11, No. 12, pp. 2130-2139, Aug. 2018. doi: https://doi.org/10.14778/3229863.3240491   DOI
16 M. Altman, E. Castro, M. Crosas, P. Durbin, A. Garnett, and J. Whitney, "Open journal systems and dataverse integration-helping journals to upgrade data publication for reusable research," Code4Lib Journal, Issue 30, Oct. 2015.
17 Practical BM25-Part 2: The BM25 algorithms and its variables, https://www.elastic.co/kr/blog/practical-bm25-part-2-the-bm25-algori thm-and-its-variables (accessed Mar. 05 2020).
18 JSON for linking data, https://json-ld.org/ (accessed Feb. 22, 2022).
19 S. Neumaier and A. Polleres, "Enabling spatio-temporal search in open data," Journal of Web Semantics, Vol. 55, pp. 21-36, Mar. 2019. doi: https://doi.org/10.1016/j.websem.2018.12.007   DOI
20 M. Thelwall and K. Kousha, " Figshare: a universal repository for academic resource sharing?" Online Information Review, Vol. 40, No. 3, pp. 333-346, June 2016. doi: https://doi.org/10.1108/OIR-06-2015-0190   DOI
21 Elsevier scientific repository, https://datasearch.elsevier.com/ (accessed July 4, 2022).
22 Korean public data portal (data.go.kr), https://www.data.go.kr/en/index.do (accessed June 13, 2022).
23 Kaggle, https://www.kaggle.com/ (accessed May 14, 2022).
24 Google dataset search, https://datasetsearch.research.google.com (accessed June 7, 2022).
25 J. Hendler, J. Holm, C. Musialek, and G. Thomas, "Us government linked open data: Semantic.data.gov.," IEEE Intelligent Systems, Vol. 27, No. 3, pp. 25-31, May 2022. doi: https://doi.org/10.1109/MIS.2012.27   DOI
26 Open data monitor, https://opendatamonitor.eu/ (accessed June 21, 2022).
27 Uk open data portal, https://data.gov.uk/ (accessed June 20, 2022).
28 N. Noy, M. Burgess, and D. Brickley, "Google dataset search: building a search engine for datasets in an open web ecosystem," The World Wide Web Conference 2019, San Francisco, USA, pp. 1365-1375, May 13, 2019. doi: https://doi.org/10.1145/3308558.3313685   DOI
29 Apache Lucene, https://lucene.apache.org/ (accessed Oct. 15, 2021).
30 Apache Solr, https://solr.apache.org/ (accessed Oct. 15, 2021).