• Title/Summary/Keyword: reference metadata recognition

Search Result 4, Processing Time 0.019 seconds

A Study on Recognition of Citation Metadata using Bidirectional GRU-CRF Model based on Pre-trained Language Model (사전학습 된 언어 모델 기반의 양방향 게이트 순환 유닛 모델과 조건부 랜덤 필드 모델을 이용한 참고문헌 메타데이터 인식 연구)

  • Ji, Seon-yeong;Choi, Sung-pil
    • Journal of the Korean Society for information Management
    • /
    • v.38 no.1
    • /
    • pp.221-242
    • /
    • 2021
  • This study applied reference metadata recognition using bidirectional GRU-CRF model based on pre-trained language model. The experimental group consists of 161,315 references extracted by 53,562 academic documents in PDF format collected from 40 journals published in 2018 based on rules. In order to construct an experiment set. This study was conducted to automatically extract the references from academic literature in PDF format. Through this study, the language model with the highest performance was identified, and additional experiments were conducted on the model to compare the recognition performance according to the size of the training set. Finally, the performance of each metadata was confirmed.

Automatic Extraction of References for Research Reports using Deep Learning Language Model (딥러닝 언어 모델을 이용한 연구보고서의 참고문헌 자동추출 연구)

  • Yukyung Han;Wonsuk Choi;Minchul Lee
    • Journal of the Korean Society for information Management
    • /
    • v.40 no.2
    • /
    • pp.115-135
    • /
    • 2023
  • The purpose of this study is to assess the effectiveness of using deep learning language models to extract references automatically and create a reference database for research reports in an efficient manner. Unlike academic journals, research reports present difficulties in automatically extracting references due to variations in formatting across institutions. In this study, we addressed this issue by introducing the task of separating references from non-reference phrases, in addition to the commonly used metadata extraction task for reference extraction. The study employed datasets that included various types of references, such as those from research reports of a particular institution, academic journals, and a combination of academic journal references and non-reference texts. Two deep learning language models, namely RoBERTa+CRF and ChatGPT, were compared to evaluate their performance in automatic extraction. They were used to extract metadata, categorize data types, and separate original text. The research findings showed that the deep learning language models were highly effective, achieving maximum F1-scores of 95.41% for metadata extraction and 98.91% for categorization of data types and separation of the original text. These results provide valuable insights into the use of deep learning language models and different types of datasets for constructing reference databases for research reports including both reference and non-reference texts.

A study on Multiple Entity Data Model Design for Visual-Arts Archives and Information Management in the case of the KS X ISO 23081 Multiple Entity Model (시각예술기록정보 관리를 위한 데이터모델 설계 KS X ISO 23081 다중 엔티티 모델의 적용을 중심으로)

  • Hwang, Jin-hyun;Yim, Jin-hee
    • The Korean Journal of Archival Studies
    • /
    • no.33
    • /
    • pp.155-206
    • /
    • 2012
  • Interests in archives management are getting expanded from the public sector into the cultural and artistic field for the ten years after legislation of "Act on the Management of Public Archives" in 1999. However, due to lack of recognition on the importance of archives in the cultural and artistic field, it is rather frequent that information is kept scattered or archives are lost. As an example, absence of precise contract documents or notes of bestowal keeps people from locating great amount of cultural properties, and because of it these creative properties are in the risk of thefts, the closed-door auctioning, or trades in unofficial channels. As how a nation manages cultural and artistic creation inside the nation reflects its cultural level, it can be said that one of the indexes to notice the extent of a nation's cultural level is to take a look at how they are circulated. This study started from this point. Growing economy and rising interests in culture and art made the society more cognizant of the importance and value that visual artworks have, but the archives and information which are showing the context of these artworks and are produced in the course of social interaction are relatively disregarded because too much emphasis lies on the work itself. It is harder to find archives or documentations in Korea than in other advanced countries about the artists themselves or philosophical discourse on the background of the artworks. There is not so much interest to preserve the archives and information produced after the exhibition also, and they are used for no more than promotion or reference. Hereupon, the researcher recognized the importance of visual arts archives and believed that systemic management on them are high in need. And metadata is an essential way for the systemic management, as recently management on artworks or their archives are conducted using the system of the agencies even though they are not produced electronically. The objective of this study is to manage visual arts archives systematically by designing a data model reflecting traits of visual arts archives. Metadata are needed in the every course of archives from acquisition to management, preservation and application. Visual arts archives find its rich value only when a systemic relationship is established among information on artist, artwork and events including exhibition. By establishing a Multiple Entity Data Model, in which artworks, artists and events (exhibitions) make relationship all together, metadata for management on visual arts archive gets more efficiency and at the same time explanatory trait of the archive gets higher. For this reason we, in the study, tried to design a data model by setting each as an independent entities and designating relations between them, in order to find a way to manage visual arts archives more systematically.

Deep Learning Description Language for Referring to Analysis Model Based on Trusted Deep Learning (신뢰성있는 딥러닝 기반 분석 모델을 참조하기 위한 딥러닝 기술 언어)

  • Mun, Jong Hyeok;Kim, Do Hyung;Choi, Jong Sun;Choi, Jae Young
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.10 no.4
    • /
    • pp.133-142
    • /
    • 2021
  • With the recent advancements of deep learning, companies such as smart home, healthcare, and intelligent transportation systems are utilizing its functionality to provide high-quality services for vehicle detection, emergency situation detection, and controlling energy consumption. To provide reliable services in such sensitive systems, deep learning models are required to have high accuracy. In order to develop a deep learning model for analyzing previously mentioned services, developers should utilize the state of the art deep learning models that have already been verified for higher accuracy. The developers can verify the accuracy of the referenced model by validating the model on the dataset. For this validation, the developer needs structural information to document and apply deep learning models, including metadata such as learning dataset, network architecture, and development environments. In this paper, we propose a description language that represents the network architecture of the deep learning model along with its metadata that are necessary to develop a deep learning model. Through the proposed description language, developers can easily verify the accuracy of the referenced deep learning model. Our experiments demonstrate the application scenario of a deep learning description document that focuses on the license plate recognition for the detection of illegally parked vehicles.