Browse > Article
http://dx.doi.org/10.4275/KSLIS.2022.56.3.241

Automatic Generation of Bibliographic Metadata with Reference Information for Academic Journals  

Jeong, Seonki (경기대학교 문헌정보학과)
Shin, Hyeonho (경기대학교 문헌정보학과)
Ji, Seon-Yeong (주식회사 보인정보기술)
Choi, Sungphil (경기대학교 문헌정보학과)
Publication Information
Journal of the Korean Society for Library and Information Science / v.56, no.3, 2022 , pp. 241-264 More about this Journal
Abstract
Bibliographic metadata can help researchers effectively utilize essential publications that they need and grasp academic trends of their own fields. With the manual creation of the metadata costly and time-consuming. it is nontrivial to effectively automatize the metadata construction using rule-based methods due to the immoderate variety of the article forms and styles according to publishers and academic societies. Therefore, this study proposes a two-step extraction process based on rules and deep neural networks for generating bibliographic metadata of scientific articlles to overcome the difficulties above. The extraction target areas in articles were identified by using a deep neural network-based model, and then the details in the areas were analyzed and sub-divided into relevant metadata elements. IThe proposed model also includes a model for generating reference summary information, which is able to separate the end of the text and the starting point of a reference, and to extract individual references by essential rule set, and to identify all the bibliographic items in each reference by a deep neural network. In addition, in order to confirm the possibility of a model that generates the bibliographic information of academic papers without pre- and post-processing, we conducted an in-depth comparative experiment with various settings and configurations. As a result of the experiment, the method proposed in this paper showed higher performance.
Keywords
NLP; Information Extraction; Reference Extraction; Metadata Extraction; Language Model;
Citations & Related Records
Times Cited By KSCI : 3  (Citation Analysis)
연도 인용수 순위
1 Ji, Seon-Young & Choi, Sung-Pil (2021). A study on recognition of citation metadata using Bidirectional GRU-CRF model based on pre-trained language model. Journal of the Korean Society for information Management, 38(1), 221-242.   DOI
2 Ji, Seon-young (2021). A Study on Automatic Extrqaction of Metadata for papers in PDF format. Master's thesis, Kyonggi University.
3 Kim, Jae-Hoon, Kim, Soon-Young, Im, Seok-Jong, & Hwang, Hye-Gyung (2019). Case study of journal article and reference mapping. Journal of the Korea Contents Association, 19(11), 262-269.   DOI
4 Tkaczyk, D., Bolikowski, L., Czeczko, A., & Rusek, K. (2012) A modular metadata extraction system for born-digital articles. In 2012 10th IAPR International Workshop on Document Analysis Systems, 11-16.
5 Lim, Su-Hyun, Yoon, Te-Rin, Choi, Gyeong-Cheol, Cho, Won-Min, Heo, Jae-Jong, Han, Heyon-Woo, & Lee Kyung-Won (2019). A proposal for a bibliographic search interface using impact factor in the genealogy of academic literature. in Proceeding of HCI KOREA 2019, 526-529.
6 Kim, Ji-Hoon (2003). A study on automatic extraction of citation information for reference linking. Journal of the Korean Society for Library and Information Science, 37(1), 247-268.   DOI
7 Kim, Seon-Wu, Ji, Seon-Young, Jeong, Hee-Seok, Yoon, Hwa-Mook, & Choi, Sung-Pil (2019). Metadata extraction based on deep learning from academic paper in PDF. Journal of KIISE, 46(7), 644-652.   DOI
8 Kim, Seon-Wu, Ji, Seon-Young, Seol, Jae-Wook, Jeong, Hee-Seok, & Choi, Sung-Pil (2018). Bidirectional GRU-GRU CRF based citation metadata recognition. In Annual Conference on Human and Language Technology, 30, 461-464.
9 An, D., Gao, L., Jiang, Z., Liu, R., & Tang, Z. (2017). Citation metadata extraction via deep neural network-based segment sequence labeling. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, 1967-1970.
10 Granitzer, M., Hristakeva, M., Knight, R., Jack, K., & Kern, R. (2012), A comparison of layout based bibliographic metadata extraction techniques. In Proceedings of the 2nd International Conference on Web Intelligence, Mining and Semantics, 2, 1-8.
11 Tkaczyk, D., Szostek, Szostek, P., Fedoryszak, M., Dendek, P. J., & Bolikowski, L. (2015). CERMINE: automatic extraction of structured metadata from scientific literature. International Journal on Document Analysis and Recognition, 18, 317-335   DOI
12 Kovacevic, A., Ivanovic, D., Milosavljevic, B., Konjovic, Z., & Surla, D. (2011). Automatic extraction of metadata from scientific publications for CRIS systems. Program: electronic library and information systems, 45(4), 376-396.   DOI
13 Liu, R., Gao, L., An, D., Jiang, Z., & Tang, Z. (2017). Automatic document metadata extraction based on deep networks. In National CCF Conference on Natural Language Processing and Chinese Computing, 305-317.
14 Besagni, D. & Belaid, A. (2004). Citation recognition for scientific publications in digital libraries. In First International Workshop on Document Image Analysis for Libraries, 244-252.
15 Lee, J. (2020). KcBERT. GitHub. Available: https://github.com/Beomi/KcBERT
16 Souza, A., Moreira, V., & Heuser, C. (2017). ARCTIC: metadata extraction from scientific papers in pdf using two-layer CRF. In Proceedings of the 2014 ACM Symposium on Document Engineering, 121-130.
17 Ziviani, N., Goncalves, M. A., de Moura, E. S., Ribeiro-Neto, B., da Silva, A. S., & Veloso, A. (2011). Information Retrieval Research at UFMG. Journal of Information and Data Management, 2(2), 77-77.
18 Powley, B. & Dale, R. (2007). High accuracy citation extraction and named entity recognition for a heterogeneous corpus of academic papers. In 2007 International Conference on Natural Language Processing and Knowledge Engineering, 119-124.