• Title/Summary/Keyword: Extraction Metadata

Search Result 41, Processing Time 0.021 seconds

Automatic Extraction of References for Research Reports using Deep Learning Language Model (딥러닝 언어 모델을 이용한 연구보고서의 참고문헌 자동추출 연구)

  • Yukyung Han;Wonsuk Choi;Minchul Lee
    • Journal of the Korean Society for information Management
    • /
    • v.40 no.2
    • /
    • pp.115-135
    • /
    • 2023
  • The purpose of this study is to assess the effectiveness of using deep learning language models to extract references automatically and create a reference database for research reports in an efficient manner. Unlike academic journals, research reports present difficulties in automatically extracting references due to variations in formatting across institutions. In this study, we addressed this issue by introducing the task of separating references from non-reference phrases, in addition to the commonly used metadata extraction task for reference extraction. The study employed datasets that included various types of references, such as those from research reports of a particular institution, academic journals, and a combination of academic journal references and non-reference texts. Two deep learning language models, namely RoBERTa+CRF and ChatGPT, were compared to evaluate their performance in automatic extraction. They were used to extract metadata, categorize data types, and separate original text. The research findings showed that the deep learning language models were highly effective, achieving maximum F1-scores of 95.41% for metadata extraction and 98.91% for categorization of data types and separation of the original text. These results provide valuable insights into the use of deep learning language models and different types of datasets for constructing reference databases for research reports including both reference and non-reference texts.

A Case Study on Metadata Extractionfor Records Management Using ChatGPT (챗GPT를 활용한 기록관리 메타데이터 추출 사례연구)

  • Minji Kim;Sunghee Kang;Hae-young Rieh
    • Journal of Korean Society of Archives and Records Management
    • /
    • v.24 no.2
    • /
    • pp.89-112
    • /
    • 2024
  • Metadata is a crucial component of record management, playing a vital role in properly managing and understanding the record. In cases where automatic metadata assignment is not feasible, manual input by records professionals becomes necessary. This study aims to alleviate the challenges associated with manual entry by proposing a method that harnesses ChatGPT technology for extracting records management metadata elements. To employ ChatGPT technology, a Python program utilizing the LangChain library was developed. This program was designed to analyze PDF documents and extract metadata from records through questions, both with a locally installed instance of ChatGPT and the ChatGPT online service. Multiple PDF documents were subjected to this process to test the effectiveness of metadata extraction. The results revealed that while using LangChain with ChatGPT-3.5 turbo provided a secure environment, it exhibited some limitations in accurately retrieving metadata elements. Conversely, the ChatGPT-4 online service yielded relatively accurate results despite being unable to handle sensitive documents for security reasons. This exploration underscores the potential of utilizing ChatGPT technology to extract metadata in records management. With advancements in ChatGPT-related technologies, safer and more accurate results are expected to be achieved. Leveraging these advantages can significantly enhance the efficiency and productivity of tasks associated with managing records and metadata in archives.

Quality Evaluation of Automatically Generated Metadata Using ChatGPT: Focusing on Dublin Core for Korean Monographs (ChatGPT가 자동 생성한 더블린 코어 메타데이터의 품질 평가: 국내 도서를 대상으로)

  • SeonWook Kim;HyeKyung Lee;Yong-Gu Lee
    • Journal of the Korean Society for information Management
    • /
    • v.40 no.2
    • /
    • pp.183-209
    • /
    • 2023
  • The purpose of this study is to evaluate the Dublin Core metadata generated by ChatGPT using book covers, title pages, and colophons from a collection of books. To achieve this, we collected book covers, title pages, and colophons from 90 books and inputted them into ChatGPT to generate Dublin Core metadata. The performance was evaluated in terms of completeness and accuracy. The overall results showed a satisfactory level of completeness at 0.87 and accuracy at 0.71. Among the individual elements, Title, Creator, Publisher, Date, Identifier, Rights, and Language exhibited higher performance. Subject and Description elements showed relatively lower performance in terms of completeness and accuracy, but it confirmed the generation capability known as the inherent strength of ChatGPT. On the other hand, books in the sections of social sciences and technology of DDC showed slightly lower accuracy in the Contributor element. This was attributed to ChatGPT's attribution extraction errors, omissions in the original bibliographic description contents for metadata, and the language composition of the training data used by ChatGPT.

A Study on Ontology Design for Research Data Management (연구데이터 관리를 위한 온톨로지 설계에 대한 연구)

  • Park, Ok Nam
    • Journal of Korean Society of Archives and Records Management
    • /
    • v.18 no.1
    • /
    • pp.101-127
    • /
    • 2018
  • The systematic management of research data is vital because it increases research data's value for research reproduction, verification, and reusability. Standard metadata will play a key role in research data registration, management, and data extraction. Research data has various structural relationships, such as research, research data, data sets, and files, and associated with entities such as citations and research results. The study proposes an ontology model for research data management. It also suggests the application of ontology to NTIS. Previous studies, metadata standard analyses, and research data repository case studies were conducted.

Implementation of Sports Video Clip Extraction Based on MobileNetV3 Transfer Learning (MobileNetV3 전이학습 기반 스포츠 비디오 클립 추출 구현)

  • YU, LI
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.17 no.5
    • /
    • pp.897-904
    • /
    • 2022
  • Sports video is a very critical information resource. High-precision extraction of effective segments in sports video can better assist coaches in analyzing the player's actions in the video, and enable users to more intuitively appreciate the player's hitting action. Aiming at the shortcomings of the current sports video clip extraction results, such as strong subjectivity, large workload and low efficiency, a classification method of sports video clips based on MobileNetV3 is proposed to save user time. Experiments evaluate the effectiveness of effective segment extraction. Among the extracted segments, the effective proportion is 97.0%, indicating that the effective segment extraction results are good, and it can lay the foundation for the construction of the subsequent badminton action metadata video dataset.

Implementation of Musical Note Generation System using Rhythm Information (리듬정보를 이용한 악보생성 시스템 구현)

  • 소두석;최재원;이종혁
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.7 no.6
    • /
    • pp.1210-1216
    • /
    • 2003
  • Traditional indexing mechanism are based on the song's metadata such as the title and the composer and so on. However, these system have a major limitation that users have to know the metadata of the songs they want to retrieve. In order to solve these limitation, we proposed a rhythm extraction system that allows users to retrieve music information efficiently from a large music database using the rhythm that is defined as the parts of the music.

Consideration upon Importance of Metadata Extraction for a Hyper-Personalized Recommender System on Unsupervised Learning (비지도 학습 기반 초개인화 추천 서비스를 위한 메타데이터 추출의 중요성 고찰)

  • Paik, Juryon;Ko, Kwang-Ho
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2022.01a
    • /
    • pp.19-22
    • /
    • 2022
  • 서비스 관점에서 구축되는 추천 시스템의 성능은 얼마나 효율적인 추천 모델을 적용하여 심층적으로 설계되었는가에 좌우된다고도 볼 수 있다. 특히, 추천 시스템의 초개인화는 세계적인 추세로 1~2년 전부터 구글, 아마존, 알리바바 등의 데이터 플랫폼 강자들이 경쟁적으로 딥 러닝 기반의 알고리즘을 개발, 자신들의 추천 서비스에 적용하고 있다. 본 연구는 갈수록 고도화되는 추천 시스템으로 인해 발생하는 여러 문제들 중 사용자 또는 서비스 정보가 부족하여 계속적으로 발생하고 있는 Cold-start 문제와 추천할 서비스와 사용자는 지속적으로 늘어나지만 실제로 사용자가 소비하게 되는 서비스의 비율은 현저하게 감소하는 데이터 희소성 문제 (Sparsity Problem)에 대한 솔루션을 모색하는 알고리즘 관점에서 연구하고자 한다. 본 논문은 첫 단계로, 적용하는 메타데이터에 따라 추천 결과의 정확성이 얼마나 차이가 나는지를 보이고 딥러닝 비지도학습 방식을 메타데이터 선정 및 추출에 적용하여 실시간으로 변화하는 소비자의 실제 생활 패턴 및 니즈를 예측해야 하는 필요성에 대해서 기술하고자 한다.

  • PDF

The Design of Object-of-Interest Extraction System Utilizing Metadata Filtering from Moving Object (이동객체의 메타데이터 필터링을 이용한 관심객체 추출 시스템 설계)

  • Kim, Taewoo;Kim, Hyungheon;Kim, Pyeongkang
    • Journal of KIISE
    • /
    • v.43 no.12
    • /
    • pp.1351-1355
    • /
    • 2016
  • The number of CCTV units is rapidly increasing annually, and the demand for intelligent video-analytics system is also increasing continuously for the effective monitoring of them. The existing analytics engines, however, require considerable computing resources and cannot provide a sufficient detection accuracy. For this paper, a light analytics engine was employed to analyze video and we collected metadata, such as an object's location and size, and the dwell time from the engine. A further data analysis was then performed to filter out the target of interest; as a result, it was possible to verify that a light engine and the heavy data analytics of the metadata from that engine can reject an enormous amount of environmental noise to extract the target of interest effectively. The result of this research is expected to contribute to the development of active intelligent-monitoring systems for the future.

Fishery R&D Big Data Platform and Metadata Management Strategy (수산과학 빅데이터 플랫폼 구축과 메타 데이터 관리방안)

  • Kim, Jae-Sung;Choi, Youngjin;Han, Myeong-Soo;Hwang, Jae-Dong;Cho, Wan-Sup
    • The Journal of Bigdata
    • /
    • v.4 no.2
    • /
    • pp.93-103
    • /
    • 2019
  • In this paper, we introduce a big data platform and a metadata management technique for fishery science R & D information. The big data platform collects and integrates various types of fisheries science R & D information and suggests how to build it in the form of a data lake. In addition to existing data collected and accumulated in the field of fisheries science, we also propose to build a big data platform that supports diverse analysis by collecting unstructured big data such as satellite image data, research reports, and research data. Next, by collecting and managing metadata during data extraction, preprocessing and storage, systematic management of fisheries science big data is possible. By establishing metadata in a standard form along with the construction of a big data platform, it is meaningful to suggest a systematic and continuous big data management method throughout the data lifecycle such as data collection, storage, utilization and distribution.

  • PDF